Search Results for author: Tom Everitt

Found 32 papers, 6 papers with code

Robust agents learn causal world models

no code implementations16 Feb 2024 Jonathan Richens, Tom Everitt

It has long been hypothesised that causal reasoning plays a fundamental role in robust and general intelligence.

Causal Inference Transfer Learning

The Reasons that Agents Act: Intention and Instrumental Goals

no code implementations11 Feb 2024 Francis Rhys Ward, Matt MacDermott, Francesco Belardinelli, Francesca Toni, Tom Everitt

In addition, we show how our definition relates to past concepts, including actual causality, and the notion of instrumental goals, which is a core idea in the literature on safe AI agents.

Philosophy

Honesty Is the Best Policy: Defining and Mitigating AI Deception

no code implementations NeurIPS 2023 Francis Rhys Ward, Francesco Belardinelli, Francesca Toni, Tom Everitt

There are a number of existing definitions of deception in the literature on game theory and symbolic AI, but there is no overarching theory of deception for learning agents in games.

Philosophy

Characterising Decision Theories with Mechanised Causal Graphs

no code implementations20 Jul 2023 Matt MacDermott, Tom Everitt, Francesco Belardinelli

How should my own decisions affect my beliefs about the outcomes I expect to achieve?

Human Control: Definitions and Algorithms

no code implementations31 May 2023 Ryan Carey, Tom Everitt

How can humans stay in control of advanced artificial intelligence systems?

Reasoning about Causality in Games

no code implementations5 Jan 2023 Lewis Hammond, James Fox, Tom Everitt, Ryan Carey, Alessandro Abate, Michael Wooldridge

Regarding question iii), we describe correspondences between causal games and other formalisms, and explain how causal games can be used to answer queries that other causal or game-theoretic models do not support.

Discovering Agents

no code implementations17 Aug 2022 Zachary Kenton, Ramana Kumar, Sebastian Farquhar, Jonathan Richens, Matt MacDermott, Tom Everitt

Causal models of agents have been used to analyse the safety aspects of machine learning systems.

Causal Discovery

Path-Specific Objectives for Safer Agent Incentives

no code implementations21 Apr 2022 Sebastian Farquhar, Ryan Carey, Tom Everitt

We then train agents to maximize the causal effect of actions on the expected return which is not mediated by the delicate parts of state, using Causal Influence Diagram analysis.

A Complete Criterion for Value of Information in Soluble Influence Diagrams

no code implementations23 Feb 2022 Chris van Merwijk, Ryan Carey, Tom Everitt

Influence diagrams have recently been used to analyse the safety and fairness properties of AI systems.

Fairness

Why Fair Labels Can Yield Unfair Predictions: Graphical Conditions for Introduced Unfairness

no code implementations22 Feb 2022 Carolyn Ashurst, Ryan Carey, Silvia Chiappa, Tom Everitt

In addition to reproducing discriminatory relationships in the training data, machine learning systems can also introduce or amplify discriminatory effects.

Attribute

Alignment of Language Agents

no code implementations26 Mar 2021 Zachary Kenton, Tom Everitt, Laura Weidinger, Iason Gabriel, Vladimir Mikulik, Geoffrey Irving

For artificial intelligence to be beneficial to humans the behaviour of AI agents needs to be aligned with what humans want.

How RL Agents Behave When Their Actions Are Modified

1 code implementation15 Feb 2021 Eric D. Langlois, Tom Everitt

Reinforcement learning in complex environments may require supervision to prevent the agent from attempting dangerous actions.

reinforcement-learning Reinforcement Learning (RL)

Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

1 code implementation9 Feb 2021 Lewis Hammond, James Fox, Tom Everitt, Alessandro Abate, Michael Wooldridge

Multi-agent influence diagrams (MAIDs) are a popular form of graphical model that, for certain classes of games, have been shown to offer key complexity and explainability advantages over traditional extensive form game (EFG) representations.

Agent Incentives: A Causal Perspective

no code implementations2 Feb 2021 Tom Everitt, Ryan Carey, Eric Langlois, Pedro A Ortega, Shane Legg

We propose a new graphical criterion for value of control, establishing its soundness and completeness.

Fairness

Avoiding Tampering Incentives in Deep RL via Decoupled Approval

no code implementations17 Nov 2020 Jonathan Uesato, Ramana Kumar, Victoria Krakovna, Tom Everitt, Richard Ngo, Shane Legg

How can we design agents that pursue a given objective when all feedback mechanisms are influenceable by the agent?

REALab: An Embedded Perspective on Tampering

no code implementations17 Nov 2020 Ramana Kumar, Jonathan Uesato, Richard Ngo, Tom Everitt, Victoria Krakovna, Shane Legg

Standard Markov Decision Process (MDP) formulations of RL and simulated environments mirroring the MDP structure assume secure access to feedback (e. g., rewards).

Reinforcement Learning (RL)

The Incentives that Shape Behaviour

no code implementations20 Jan 2020 Ryan Carey, Eric Langlois, Tom Everitt, Shane Legg

Which variables does an agent have an incentive to control with its decision, and which variables does it have an incentive to respond to?

Fairness

Modeling AGI Safety Frameworks with Causal Influence Diagrams

no code implementations20 Jun 2019 Tom Everitt, Ramana Kumar, Victoria Krakovna, Shane Legg

Proposals for safe AGI systems are typically made at the level of frameworks, specifying how the components of the proposed system should be trained and interact with each other.

Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings

no code implementations26 Feb 2019 Tom Everitt, Pedro A. Ortega, Elizabeth Barnes, Shane Legg

Modeling the agent-environment interaction using causal influence diagrams, we can answer two fundamental questions about an agent's incentives directly from the graph: (1) which nodes can the agent have an incentivize to observe, and (2) which nodes can the agent have an incentivize to control?

Reinforcement Learning (RL)

Scalable agent alignment via reward modeling: a research direction

3 code implementations19 Nov 2018 Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg

One obstacle to applying reinforcement learning algorithms to real-world problems is the lack of suitable reward functions.

Atari Games reinforcement-learning +1

AGI Safety Literature Review

no code implementations3 May 2018 Tom Everitt, Gary Lea, Marcus Hutter

The development of Artificial General Intelligence (AGI) promises to be a major event.

AI Safety Gridworlds

2 code implementations27 Nov 2017 Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, Shane Legg

We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents.

reinforcement-learning Reinforcement Learning (RL) +1

Count-Based Exploration in Feature Space for Reinforcement Learning

1 code implementation25 Jun 2017 Jarryd Martin, Suraj Narayanan Sasikumar, Tom Everitt, Marcus Hutter

We present a new method for computing a generalised state visit-count, which allows the agent to estimate the uncertainty associated with any state.

Atari Games Efficient Exploration +2

Reinforcement Learning with a Corrupted Reward Channel

1 code implementation23 May 2017 Tom Everitt, Victoria Krakovna, Laurent Orseau, Marcus Hutter, Shane Legg

Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards.

reinforcement-learning Reinforcement Learning (RL)

Free Lunch for Optimisation under the Universal Distribution

no code implementations16 Aug 2016 Tom Everitt, Tor Lattimore, Marcus Hutter

Function optimisation is a major challenge in computer science.

Death and Suicide in Universal Artificial Intelligence

no code implementations2 Jun 2016 Jarryd Martin, Tom Everitt, Marcus Hutter

Reinforcement learning (RL) is a general paradigm for studying intelligent behaviour, with applications ranging from artificial intelligence to psychology and economics.

Reinforcement Learning (RL)

Avoiding Wireheading with Value Reinforcement Learning

no code implementations10 May 2016 Tom Everitt, Marcus Hutter

Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentivised to shortcut the reward sensor for maximum reward -- the so-called wireheading problem.

reinforcement-learning Reinforcement Learning (RL)

Self-Modification of Policy and Utility Function in Rational Agents

no code implementations10 May 2016 Tom Everitt, Daniel Filan, Mayank Daswani, Marcus Hutter

As we continue to create more and more intelligent agents, chances increase that they will learn about this ability.

General Reinforcement Learning

Sequential Extensions of Causal and Evidential Decision Theory

no code implementations24 Jun 2015 Tom Everitt, Jan Leike, Marcus Hutter

Moving beyond the dualistic view in AI where agent and environment are separated incurs new challenges for decision making, as calculation of expected utility is no longer straightforward.

Decision Making

Cannot find the paper you are looking for? You can Submit a new open access paper.