2 code implementations • 27 Nov 2017 • Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, Shane Legg
We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents.
1 code implementation • 25 Jun 2017 • Jarryd Martin, Suraj Narayanan Sasikumar, Tom Everitt, Marcus Hutter
We present a new method for computing a generalised state visit-count, which allows the agent to estimate the uncertainty associated with any state.
Ranked #13 on Atari Games on Atari 2600 Montezuma's Revenge
1 code implementation • 9 Feb 2021 • Lewis Hammond, James Fox, Tom Everitt, Alessandro Abate, Michael Wooldridge
Multi-agent influence diagrams (MAIDs) are a popular form of graphical model that, for certain classes of games, have been shown to offer key complexity and explainability advantages over traditional extensive form game (EFG) representations.
3 code implementations • 19 Nov 2018 • Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg
One obstacle to applying reinforcement learning algorithms to real-world problems is the lack of suitable reward functions.
1 code implementation • 23 May 2017 • Tom Everitt, Victoria Krakovna, Laurent Orseau, Marcus Hutter, Shane Legg
Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards.
1 code implementation • 15 Feb 2021 • Eric D. Langlois, Tom Everitt
Reinforcement learning in complex environments may require supervision to prevent the agent from attempting dangerous actions.
no code implementations • 3 May 2018 • Tom Everitt, Gary Lea, Marcus Hutter
The development of Artificial General Intelligence (AGI) promises to be a major event.
no code implementations • 9 Sep 2015 • Tom Everitt, Marcus Hutter
In this paper we derive estimates for average BFS and DFS runtime.
no code implementations • 16 Aug 2016 • Tom Everitt, Tor Lattimore, Marcus Hutter
Function optimisation is a major challenge in computer science.
no code implementations • 2 Jun 2016 • Jarryd Martin, Tom Everitt, Marcus Hutter
Reinforcement learning (RL) is a general paradigm for studying intelligent behaviour, with applications ranging from artificial intelligence to psychology and economics.
no code implementations • 10 May 2016 • Tom Everitt, Marcus Hutter
Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentivised to shortcut the reward sensor for maximum reward -- the so-called wireheading problem.
no code implementations • 10 May 2016 • Tom Everitt, Daniel Filan, Mayank Daswani, Marcus Hutter
As we continue to create more and more intelligent agents, chances increase that they will learn about this ability.
no code implementations • 24 Jun 2015 • Tom Everitt, Jan Leike, Marcus Hutter
Moving beyond the dualistic view in AI where agent and environment are separated incurs new challenges for decision making, as calculation of expected utility is no longer straightforward.
no code implementations • 26 Feb 2019 • Tom Everitt, Pedro A. Ortega, Elizabeth Barnes, Shane Legg
Modeling the agent-environment interaction using causal influence diagrams, we can answer two fundamental questions about an agent's incentives directly from the graph: (1) which nodes can the agent have an incentivize to observe, and (2) which nodes can the agent have an incentivize to control?
no code implementations • 20 Jun 2019 • Tom Everitt, Ramana Kumar, Victoria Krakovna, Shane Legg
Proposals for safe AGI systems are typically made at the level of frameworks, specifying how the components of the proposed system should be trained and interact with each other.
no code implementations • 13 Aug 2019 • Tom Everitt, Marcus Hutter, Ramana Kumar, Victoria Krakovna
Can humans get arbitrarily capable reinforcement learning (RL) agents to do their bidding?
no code implementations • 20 Jan 2020 • Ryan Carey, Eric Langlois, Tom Everitt, Shane Legg
Which variables does an agent have an incentive to control with its decision, and which variables does it have an incentive to respond to?
no code implementations • 17 Nov 2020 • Jonathan Uesato, Ramana Kumar, Victoria Krakovna, Tom Everitt, Richard Ngo, Shane Legg
How can we design agents that pursue a given objective when all feedback mechanisms are influenceable by the agent?
no code implementations • 17 Nov 2020 • Ramana Kumar, Jonathan Uesato, Richard Ngo, Tom Everitt, Victoria Krakovna, Shane Legg
Standard Markov Decision Process (MDP) formulations of RL and simulated environments mirroring the MDP structure assume secure access to feedback (e. g., rewards).
no code implementations • 2 Feb 2021 • Tom Everitt, Ryan Carey, Eric Langlois, Pedro A Ortega, Shane Legg
We propose a new graphical criterion for value of control, establishing its soundness and completeness.
no code implementations • 26 Mar 2021 • Zachary Kenton, Tom Everitt, Laura Weidinger, Iason Gabriel, Vladimir Mikulik, Geoffrey Irving
For artificial intelligence to be beneficial to humans the behaviour of AI agents needs to be aligned with what humans want.
no code implementations • 20 Oct 2021 • Pedro A. Ortega, Markus Kunesch, Grégoire Delétang, Tim Genewein, Jordi Grau-Moya, Joel Veness, Jonas Buchli, Jonas Degrave, Bilal Piot, Julien Perolat, Tom Everitt, Corentin Tallec, Emilio Parisotto, Tom Erez, Yutian Chen, Scott Reed, Marcus Hutter, Nando de Freitas, Shane Legg
The recent phenomenal success of language models has reinvigorated machine learning research, and large sequence models such as transformers are being applied to a variety of domains.
no code implementations • 22 Feb 2022 • Carolyn Ashurst, Ryan Carey, Silvia Chiappa, Tom Everitt
In addition to reproducing discriminatory relationships in the training data, machine learning systems can also introduce or amplify discriminatory effects.
no code implementations • 23 Feb 2022 • Chris van Merwijk, Ryan Carey, Tom Everitt
Influence diagrams have recently been used to analyse the safety and fairness properties of AI systems.
no code implementations • 21 Apr 2022 • Sebastian Farquhar, Ryan Carey, Tom Everitt
We then train agents to maximize the causal effect of actions on the expected return which is not mediated by the delicate parts of state, using Causal Influence Diagram analysis.
no code implementations • 17 Aug 2022 • Zachary Kenton, Ramana Kumar, Sebastian Farquhar, Jonathan Richens, Matt MacDermott, Tom Everitt
Causal models of agents have been used to analyse the safety aspects of machine learning systems.
no code implementations • 5 Jan 2023 • Lewis Hammond, James Fox, Tom Everitt, Ryan Carey, Alessandro Abate, Michael Wooldridge
Regarding question iii), we describe correspondences between causal games and other formalisms, and explain how causal games can be used to answer queries that other causal or game-theoretic models do not support.
no code implementations • 31 May 2023 • Ryan Carey, Tom Everitt
How can humans stay in control of advanced artificial intelligence systems?
no code implementations • 20 Jul 2023 • Matt MacDermott, Tom Everitt, Francesco Belardinelli
How should my own decisions affect my beliefs about the outcomes I expect to achieve?
no code implementations • NeurIPS 2023 • Francis Rhys Ward, Francesco Belardinelli, Francesca Toni, Tom Everitt
There are a number of existing definitions of deception in the literature on game theory and symbolic AI, but there is no overarching theory of deception for learning agents in games.
no code implementations • 11 Feb 2024 • Francis Rhys Ward, Matt MacDermott, Francesco Belardinelli, Francesca Toni, Tom Everitt
In addition, we show how our definition relates to past concepts, including actual causality, and the notion of instrumental goals, which is a core idea in the literature on safe AI agents.
no code implementations • 16 Feb 2024 • Jonathan Richens, Tom Everitt
It has long been hypothesised that causal reasoning plays a fundamental role in robust and general intelligence.