no code implementations • 12 Dec 2024 • Lewis Hammond, Sam Adam-Day
We consider the problem of how a trusted, but computationally bounded agent (a 'verifier') can learn to interact with one or more powerful but untrusted agents ('provers') in order to solve a given task.
no code implementations • 18 Oct 2024 • Vojtech Kovarik, Nathaniel Sauerberg, Lewis Hammond, Vincent Conitzer
As positive results, we establish that mixed-strategy simulation can improve social welfare if the simulator has the option to scale their level of trust, if the players face challenges with both trust and coordination, or if maintaining some level of privacy is essential for enabling cooperation.
no code implementations • 17 Jun 2024 • Alan Chan, Noam Kolt, Peter Wills, Usman Anwar, Christian Schroeder de Witt, Nitarshan Rajkumar, Lewis Hammond, David Krueger, Lennart Heim, Markus Anderljung
AI systems are increasingly pervasive, yet information needed to decide whether and how to engage with them may not exist or be accessible.
1 code implementation • 15 Apr 2024 • Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi, Alan Chan, Markus Anderljung, Lilian Edwards, Aleksandar Petrov, Christian Schroeder de Witt, Sumeet Ramesh Motwan, Yoshua Bengio, Danqi Chen, Philip H. S. Torr, Samuel Albanie, Tegan Maharaj, Jakob Foerster, Florian Tramer, He He, Atoosa Kasirzadeh, Yejin Choi, David Krueger
This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs).
no code implementations • 24 Feb 2024 • Oliver Sourbut, Lewis Hammond, Harriet Wood
Many settings of interest involving humans and machines -- from virtual personal assistants to autonomous vehicles -- can naturally be modelled as principals (humans) delegating to agents (machines), which then interact with each other on their principals' behalf.
no code implementations • 12 Feb 2024 • Sumeet Ramesh Motwani, Mikhail Baranchuk, Martin Strohmeier, Vijay Bolina, Philip H. S. Torr, Lewis Hammond, Christian Schroeder de Witt
In this paper, we comprehensively formalise the problem of secret collusion in systems of generative AI agents by drawing on relevant concepts from both AI and security literature.
no code implementations • 23 Jan 2024 • Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, Nitarshan Rajkumar, David Krueger, Noam Kolt, Lennart Heim, Markus Anderljung
Increased delegation of commercial, scientific, governmental, and personal activities to AI agents -- systems capable of pursuing complex goals with limited supervision -- may exacerbate existing societal risks and introduce new risks.
1 code implementation • 13 Oct 2023 • Gabriel Mukobi, Hannah Erlebach, Niklas Lauffer, Lewis Hammond, Alan Chan, Jesse Clifton
The growing capabilities and increasingly widespread deployment of AI systems necessitate robust benchmarks for measuring their cooperative capabilities.
no code implementations • 11 Jul 2023 • James Fox, Matt MacDermott, Lewis Hammond, Paul Harrenstein, Alessandro Abate, Michael Wooldridge
Multi-agent influence diagrams (MAIDs) are a popular game-theoretic model based on Bayesian networks.
no code implementations • 5 Jan 2023 • Lewis Hammond, James Fox, Tom Everitt, Ryan Carey, Alessandro Abate, Michael Wooldridge
Regarding question iii), we describe correspondences between causal games and other formalisms, and explain how causal games can be used to answer queries that other causal or game-theoretic models do not support.
1 code implementation • 28 Dec 2022 • Joar Skalse, Lewis Hammond, Charlie Griffin, Alessandro Abate
In this work we introduce reinforcement learning techniques for solving lexicographic multi-objective problems.
Multi-Objective Reinforcement Learning reinforcement-learning +1
no code implementations • 30 Sep 2022 • Daniel Jarne Ornia, Licio Romao, Lewis Hammond, Manuel Mazo Jr., Alessandro Abate
Policy robustness in Reinforcement Learning may not be desirable at any cost: the alterations caused by robustness requirements from otherwise optimal policies should be explainable, quantifiable and formally verifiable.
no code implementations • 19 Jul 2021 • Julian Gutierrez, Lewis Hammond, Anthony W. Lin, Muhammad Najib, Michael Wooldridge
Rational verification is the problem of determining which temporal logic properties will hold in a multi-agent system, under the assumption that agents in the system act rationally, by choosing strategies that collectively form a game-theoretic equilibrium.
1 code implementation • 9 Feb 2021 • Lewis Hammond, James Fox, Tom Everitt, Alessandro Abate, Michael Wooldridge
Multi-agent influence diagrams (MAIDs) are a popular form of graphical model that, for certain classes of games, have been shown to offer key complexity and explainability advantages over traditional extensive form game (EFG) representations.
1 code implementation • 1 Feb 2021 • Lewis Hammond, Alessandro Abate, Julian Gutierrez, Michael Wooldridge
In this paper, we study the problem of learning to satisfy temporal logic specifications with a group of agents in an unknown environment, which may exhibit probabilistic behaviour.
Multi-agent Reinforcement Learning reinforcement-learning +2
no code implementations • 8 Oct 2018 • Lewis Hammond, Vaishak Belle
From the viewpoint of such systems, the urgent questions are: (a) How can models of moral scenarios and blameworthiness be extracted and learnt automatically from data?