Search Results for author: Tom Everitt

Found 25 papers, 6 papers with code

Path-Specific Objectives for Safer Agent Incentives

no code implementations21 Apr 2022 Sebastian Farquhar, Ryan Carey, Tom Everitt

We then train agents to maximize the causal effect of actions on the expected return which is not mediated by the delicate parts of state, using Causal Influence Diagram analysis.

A Complete Criterion for Value of Information in Soluble Influence Diagrams

no code implementations23 Feb 2022 Chris van Merwijk, Ryan Carey, Tom Everitt

Influence diagrams have recently been used to analyse the safety and fairness properties of AI systems.


Why Fair Labels Can Yield Unfair Predictions: Graphical Conditions for Introduced Unfairness

no code implementations22 Feb 2022 Carolyn Ashurst, Ryan Carey, Silvia Chiappa, Tom Everitt

In addition to reproducing discriminatory relationships in the training data, machine learning systems can also introduce or amplify discriminatory effects.

Alignment of Language Agents

no code implementations26 Mar 2021 Zachary Kenton, Tom Everitt, Laura Weidinger, Iason Gabriel, Vladimir Mikulik, Geoffrey Irving

For artificial intelligence to be beneficial to humans the behaviour of AI agents needs to be aligned with what humans want.

How RL Agents Behave When Their Actions Are Modified

1 code implementation15 Feb 2021 Eric D. Langlois, Tom Everitt

Reinforcement learning in complex environments may require supervision to prevent the agent from attempting dangerous actions.


Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

1 code implementation9 Feb 2021 Lewis Hammond, James Fox, Tom Everitt, Alessandro Abate, Michael Wooldridge

Multi-agent influence diagrams (MAIDs) are a popular form of graphical model that, for certain classes of games, have been shown to offer key complexity and explainability advantages over traditional extensive form game (EFG) representations.

Agent Incentives: A Causal Perspective

no code implementations2 Feb 2021 Tom Everitt, Ryan Carey, Eric Langlois, Pedro A Ortega, Shane Legg

We propose a new graphical criterion for value of control, establishing its soundness and completeness.


REALab: An Embedded Perspective on Tampering

no code implementations17 Nov 2020 Ramana Kumar, Jonathan Uesato, Richard Ngo, Tom Everitt, Victoria Krakovna, Shane Legg

Standard Markov Decision Process (MDP) formulations of RL and simulated environments mirroring the MDP structure assume secure access to feedback (e. g., rewards).


Avoiding Tampering Incentives in Deep RL via Decoupled Approval

no code implementations17 Nov 2020 Jonathan Uesato, Ramana Kumar, Victoria Krakovna, Tom Everitt, Richard Ngo, Shane Legg

How can we design agents that pursue a given objective when all feedback mechanisms are influenceable by the agent?

The Incentives that Shape Behaviour

no code implementations20 Jan 2020 Ryan Carey, Eric Langlois, Tom Everitt, Shane Legg

Which variables does an agent have an incentive to control with its decision, and which variables does it have an incentive to respond to?


Modeling AGI Safety Frameworks with Causal Influence Diagrams

no code implementations20 Jun 2019 Tom Everitt, Ramana Kumar, Victoria Krakovna, Shane Legg

Proposals for safe AGI systems are typically made at the level of frameworks, specifying how the components of the proposed system should be trained and interact with each other.

Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings

no code implementations26 Feb 2019 Tom Everitt, Pedro A. Ortega, Elizabeth Barnes, Shane Legg

Modeling the agent-environment interaction using causal influence diagrams, we can answer two fundamental questions about an agent's incentives directly from the graph: (1) which nodes can the agent have an incentivize to observe, and (2) which nodes can the agent have an incentivize to control?


Scalable agent alignment via reward modeling: a research direction

3 code implementations19 Nov 2018 Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg

One obstacle to applying reinforcement learning algorithms to real-world problems is the lack of suitable reward functions.

Atari Games reinforcement-learning

AGI Safety Literature Review

no code implementations3 May 2018 Tom Everitt, Gary Lea, Marcus Hutter

The development of Artificial General Intelligence (AGI) promises to be a major event.

AI Safety Gridworlds

2 code implementations27 Nov 2017 Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, Shane Legg

We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents.

reinforcement-learning Safe Exploration

Count-Based Exploration in Feature Space for Reinforcement Learning

1 code implementation25 Jun 2017 Jarryd Martin, Suraj Narayanan Sasikumar, Tom Everitt, Marcus Hutter

We present a new method for computing a generalised state visit-count, which allows the agent to estimate the uncertainty associated with any state.

Atari Games Efficient Exploration +1

Reinforcement Learning with a Corrupted Reward Channel

1 code implementation23 May 2017 Tom Everitt, Victoria Krakovna, Laurent Orseau, Marcus Hutter, Shane Legg

Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards.


Free Lunch for Optimisation under the Universal Distribution

no code implementations16 Aug 2016 Tom Everitt, Tor Lattimore, Marcus Hutter

Function optimisation is a major challenge in computer science.

Death and Suicide in Universal Artificial Intelligence

no code implementations2 Jun 2016 Jarryd Martin, Tom Everitt, Marcus Hutter

Reinforcement learning (RL) is a general paradigm for studying intelligent behaviour, with applications ranging from artificial intelligence to psychology and economics.


Avoiding Wireheading with Value Reinforcement Learning

no code implementations10 May 2016 Tom Everitt, Marcus Hutter

Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentivised to shortcut the reward sensor for maximum reward -- the so-called wireheading problem.


Self-Modification of Policy and Utility Function in Rational Agents

no code implementations10 May 2016 Tom Everitt, Daniel Filan, Mayank Daswani, Marcus Hutter

As we continue to create more and more intelligent agents, chances increase that they will learn about this ability.

General Reinforcement Learning

Sequential Extensions of Causal and Evidential Decision Theory

no code implementations24 Jun 2015 Tom Everitt, Jan Leike, Marcus Hutter

Moving beyond the dualistic view in AI where agent and environment are separated incurs new challenges for decision making, as calculation of expected utility is no longer straightforward.

Decision Making

Cannot find the paper you are looking for? You can Submit a new open access paper.