1 code implementation • 18 Mar 2025 • Thomas Kwa, Ben West, Joel Becker, Amy Deng, Katharyn Garcia, Max Hasin, Sami Jawhar, Megan Kinniment, Nate Rush, Sydney von Arx, Ryan Bloom, Thomas Broadley, Haoxing Du, Brian Goodrich, Nikola Jurkovic, Luke Harold Miles, Seraphina Nix, Tao Lin, Neev Parikh, David Rein, Lucas Jun Koba Sato, Hjalmar Wijk, Daniel M. Ziegler, Elizabeth Barnes, Lawrence Chan
Despite rapid progress on AI benchmarks, the real-world meaning of benchmark performance remains unclear.
2 code implementations • 22 Nov 2024 • Hjalmar Wijk, Tao Lin, Joel Becker, Sami Jawhar, Neev Parikh, Thomas Broadley, Lawrence Chan, Michael Chen, Josh Clymer, Jai Dhyani, Elena Ericheva, Katharyn Garcia, Brian Goodrich, Nikola Jurkovic, Megan Kinniment, Aron Lajko, Seraphina Nix, Lucas Sato, William Saunders, Maksym Taran, Ben West, Elizabeth Barnes
We confirm that our experts make progress in the environments given 8 hours, with 82% of expert attempts achieving a non-zero score and 24% matching or exceeding our strong reference solutions.
no code implementations • 18 Dec 2023 • Megan Kinniment, Lucas Jun Koba Sato, Haoxing Du, Brian Goodrich, Max Hasin, Lawrence Chan, Luke Harold Miles, Tao R. Lin, Hjalmar Wijk, Joel Burget, Aaron Ho, Elizabeth Barnes, Paul Christiano
We find that these language model agents can only complete the easiest tasks from this list, although they make some progress on the more challenging tasks.
1 code implementation • 11 May 2022 • Hjalmar Wijk, Benjie Wang, Marta Kwiatkowska
In many domains, worst-case guarantees on the performance (e. g., prediction accuracy) of a decision function subject to distributional shifts and uncertainty about the environment are crucial.
1 code implementation • 20 Jan 2021 • Mirco Giacobbe, Mohammadhosein Hasanbeig, Daniel Kroening, Hjalmar Wijk
We present the first exact method for analysing and ensuring the safety of DRL agents for Atari games.