Search Results for author: Shane Legg

Found 41 papers, 15 papers with code

Levels of AGI: Operationalizing Progress on the Path to AGI

no code implementations4 Nov 2023 Meredith Ringel Morris, Jascha Sohl-Dickstein, Noah Fiedel, Tris Warkentin, Allan Dafoe, Aleksandra Faust, Clement Farabet, Shane Legg

With these principles in mind, we propose 'Levels of AGI' based on depth (performance) and breadth (generality) of capabilities, and reflect on how current systems fit into this ontology.

Autonomous Driving

The Hydra Effect: Emergent Self-repair in Language Model Computations

no code implementations28 Jul 2023 Thomas McGrath, Matthew Rahtz, Janos Kramar, Vladimir Mikulik, Shane Legg

We investigate the internal structure of language model computations using causal analysis and demonstrate two motifs: (1) a form of adaptive computation where ablations of one attention layer of a language model cause another layer to compensate (which we term the Hydra effect) and (2) a counterbalancing function of late MLP layers that act to downregulate the maximum-likelihood token.

Language Modelling

Beyond Bayes-optimality: meta-learning what you know you don't know

no code implementations30 Sep 2022 Jordi Grau-Moya, Grégoire Delétang, Markus Kunesch, Tim Genewein, Elliot Catt, Kevin Li, Anian Ruoss, Chris Cundy, Joel Veness, Jane Wang, Marcus Hutter, Christopher Summerfield, Shane Legg, Pedro Ortega

This is in contrast to risk-sensitive agents, which additionally exploit the higher-order moments of the return, and ambiguity-sensitive agents, which act differently when recognizing situations in which they lack knowledge.

Decision Making Meta-Learning

Your Policy Regularizer is Secretly an Adversary

no code implementations23 Mar 2022 Rob Brekelmans, Tim Genewein, Jordi Grau-Moya, Grégoire Delétang, Markus Kunesch, Shane Legg, Pedro Ortega

Policy regularization methods such as maximum entropy regularization are widely used in reinforcement learning to improve the robustness of a learned policy.

Safe Deep RL in 3D Environments using Human Feedback

no code implementations20 Jan 2022 Matthew Rahtz, Vikrant Varma, Ramana Kumar, Zachary Kenton, Shane Legg, Jan Leike

In this paper we answer this question in the affirmative, using ReQueST to train an agent to perform a 3D first-person object collection task using data entirely from human contractors.

Model-Free Risk-Sensitive Reinforcement Learning

no code implementations4 Nov 2021 Grégoire Delétang, Jordi Grau-Moya, Markus Kunesch, Tim Genewein, Rob Brekelmans, Shane Legg, Pedro A. Ortega

Since the Gaussian free energy is known to be a certainty-equivalent sensitive to the mean and the variance, the learning rule has applications in risk-sensitive decision-making.

Decision Making reinforcement-learning +1

Stochastic Approximation of Gaussian Free Energy for Risk-Sensitive Reinforcement Learning

no code implementations NeurIPS 2021 Grégoire Delétang, Jordi Grau-Moya, Markus Kunesch, Tim Genewein, Rob Brekelmans, Shane Legg, Pedro A Ortega

Since the Gaussian free energy is known to be a certainty-equivalent sensitive to the mean and the variance, the learning rule has applications in risk-sensitive decision-making.

Decision Making reinforcement-learning +1

Agent Incentives: A Causal Perspective

no code implementations2 Feb 2021 Tom Everitt, Ryan Carey, Eric Langlois, Pedro A Ortega, Shane Legg

We propose a new graphical criterion for value of control, establishing its soundness and completeness.

Fairness

Avoiding Tampering Incentives in Deep RL via Decoupled Approval

no code implementations17 Nov 2020 Jonathan Uesato, Ramana Kumar, Victoria Krakovna, Tom Everitt, Richard Ngo, Shane Legg

How can we design agents that pursue a given objective when all feedback mechanisms are influenceable by the agent?

REALab: An Embedded Perspective on Tampering

no code implementations17 Nov 2020 Ramana Kumar, Jonathan Uesato, Richard Ngo, Tom Everitt, Victoria Krakovna, Shane Legg

Standard Markov Decision Process (MDP) formulations of RL and simulated environments mirroring the MDP structure assume secure access to feedback (e. g., rewards).

Reinforcement Learning (RL)

Meta-trained agents implement Bayes-optimal agents

no code implementations NeurIPS 2020 Vladimir Mikulik, Grégoire Delétang, Tom McGrath, Tim Genewein, Miljan Martic, Shane Legg, Pedro A. Ortega

Memory-based meta-learning is a powerful technique to build agents that adapt fast to any task within a target distribution.

Meta-Learning

Avoiding Side Effects By Considering Future Tasks

no code implementations NeurIPS 2020 Victoria Krakovna, Laurent Orseau, Richard Ngo, Miljan Martic, Shane Legg

To avoid this interference incentive, we introduce a baseline policy that represents a default course of action (such as doing nothing), and use it to filter out future tasks that are not achievable by default.

Quantifying Differences in Reward Functions

1 code implementation ICLR 2021 Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike

However, this method cannot distinguish between the learned reward function failing to reflect user preferences and the policy optimization process failing to optimize the learned reward.

Pitfalls of learning a reward function online

no code implementations28 Apr 2020 Stuart Armstrong, Jan Leike, Laurent Orseau, Shane Legg

We formally introduce two desirable properties: the first is `unriggability', which prevents the agent from steering the learning process in the direction of a reward function that is easier to optimise.

The Incentives that Shape Behaviour

no code implementations20 Jan 2020 Ryan Carey, Eric Langlois, Tom Everitt, Shane Legg

Which variables does an agent have an incentive to control with its decision, and which variables does it have an incentive to respond to?

Fairness

Learning Human Objectives by Evaluating Hypothetical Behavior

1 code implementation ICML 2020 Siddharth Reddy, Anca D. Dragan, Sergey Levine, Shane Legg, Jan Leike

To address this challenge, we propose an algorithm that safely and interactively learns a model of the user's reward function.

Car Racing

Modeling AGI Safety Frameworks with Causal Influence Diagrams

no code implementations20 Jun 2019 Tom Everitt, Ramana Kumar, Victoria Krakovna, Shane Legg

Proposals for safe AGI systems are typically made at the level of frameworks, specifying how the components of the proposed system should be trained and interact with each other.

Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings

no code implementations26 Feb 2019 Tom Everitt, Pedro A. Ortega, Elizabeth Barnes, Shane Legg

Modeling the agent-environment interaction using causal influence diagrams, we can answer two fundamental questions about an agent's incentives directly from the graph: (1) which nodes can the agent have an incentivize to observe, and (2) which nodes can the agent have an incentivize to control?

Reinforcement Learning (RL)

Soft-Bayes: Prod for Mixtures of Experts with Log-Loss

no code implementations8 Jan 2019 Laurent Orseau, Tor Lattimore, Shane Legg

We consider prediction with expert advice under the log-loss with the goal of deriving efficient and robust algorithms.

Scaling shared model governance via model splitting

no code implementations ICLR 2019 Miljan Martic, Jan Leike, Andrew Trask, Matteo Hessel, Shane Legg, Pushmeet Kohli

Currently the only techniques for sharing governance of a deep learning model are homomorphic encryption and secure multiparty computation.

reinforcement-learning Reinforcement Learning (RL)

Scalable agent alignment via reward modeling: a research direction

3 code implementations19 Nov 2018 Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg

One obstacle to applying reinforcement learning algorithms to real-world problems is the lack of suitable reward functions.

Atari Games reinforcement-learning +1

Modeling Friends and Foes

no code implementations30 Jun 2018 Pedro A. Ortega, Shane Legg

How can one detect friendly and adversarial behavior from raw data?

Penalizing side effects using stepwise relative reachability

no code implementations4 Jun 2018 Victoria Krakovna, Laurent Orseau, Ramana Kumar, Miljan Martic, Shane Legg

How can we design safe reinforcement learning agents that avoid unnecessary disruptions to their environment?

Safe Reinforcement Learning

Agents and Devices: A Relative Definition of Agency

no code implementations31 May 2018 Laurent Orseau, Simon McGregor McGill, Shane Legg

According to Dennett, the same system may be described using a `physical' (mechanical) explanatory stance, or using an `intentional' (belief- and goal-based) explanatory stance.

AI Safety Gridworlds

2 code implementations27 Nov 2017 Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, Shane Legg

We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents.

reinforcement-learning Reinforcement Learning (RL) +1

Noisy Networks for Exploration

15 code implementations ICLR 2018 Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg

We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent's policy can be used to aid efficient exploration.

Atari Games Efficient Exploration +2

Deep reinforcement learning from human preferences

5 code implementations NeurIPS 2017 Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei

For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems.

Atari Games reinforcement-learning +1

Reinforcement Learning with a Corrupted Reward Channel

1 code implementation23 May 2017 Tom Everitt, Victoria Krakovna, Laurent Orseau, Marcus Hutter, Shane Legg

Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards.

reinforcement-learning Reinforcement Learning (RL)

Universal Intelligence: A Definition of Machine Intelligence

no code implementations20 Dec 2007 Shane Legg, Marcus Hutter

Finally, we survey the many other tests and definitions of intelligence that have been proposed for machines.

Cannot find the paper you are looking for? You can Submit a new open access paper.