Search Results for author: Jan Leike

Found 32 papers, 11 papers with code

Safe Deep RL in 3D Environments using Human Feedback

no code implementations20 Jan 2022 Matthew Rahtz, Vikrant Varma, Ramana Kumar, Zachary Kenton, Shane Legg, Jan Leike

In this paper we answer this question in the affirmative, using ReQueST to train an agent to perform a 3D first-person object collection task using data entirely from human contractors.

Revealing the Incentive to Cause Distributional Shift

no code implementations29 Sep 2021 David Krueger, Tegan Maharaj, Jan Leike

We use these unit tests to demonstrate that changes to the learning algorithm (e. g. introducing meta-learning) can cause previously hidden incentives to be revealed, resulting in qualitatively different behaviour despite no change in performance metric.


Recursively Summarizing Books with Human Feedback

no code implementations22 Sep 2021 Jeff Wu, Long Ouyang, Daniel M. Ziegler, Nisan Stiennon, Ryan Lowe, Jan Leike, Paul Christiano

Our human labelers are able to supervise and evaluate the models quickly, despite not having read the entire books themselves.

Abstractive Text Summarization Question Answering

Institutionalising Ethics in AI through Broader Impact Requirements

no code implementations30 May 2021 Carina Prunkl, Carolyn Ashurst, Markus Anderljung, Helena Webb, Jan Leike, Allan Dafoe

In 2020, the Conference on Neural Information Processing Systems (NeurIPS) introduced a requirement for submitting authors to include a statement on the broader societal impacts of their research.

Active Reinforcement Learning: Observing Rewards at a Cost

no code implementations13 Nov 2020 David Krueger, Jan Leike, Owain Evans, John Salvatier

Active reinforcement learning (ARL) is a variant on reinforcement learning where the agent does not observe the reward unless it chooses to pay a query cost c > 0.

Multi-Armed Bandits reinforcement-learning

Hidden Incentives for Auto-Induced Distributional Shift

no code implementations19 Sep 2020 David Krueger, Tegan Maharaj, Jan Leike

We introduce the term auto-induced distributional shift (ADS) to describe the phenomenon of an algorithm causing a change in the distribution of its own inputs.

Meta-Learning Q-Learning

Quantifying Differences in Reward Functions

1 code implementation ICLR 2021 Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike

However, this method cannot distinguish between the learned reward function failing to reflect user preferences and the policy optimization process failing to optimize the learned reward.

Pitfalls of learning a reward function online

no code implementations28 Apr 2020 Stuart Armstrong, Jan Leike, Laurent Orseau, Shane Legg

We formally introduce two desirable properties: the first is `unriggability', which prevents the agent from steering the learning process in the direction of a reward function that is easier to optimise.

Learning Human Objectives by Evaluating Hypothetical Behavior

1 code implementation ICML 2020 Siddharth Reddy, Anca D. Dragan, Sergey Levine, Shane Legg, Jan Leike

To address this challenge, we propose an algorithm that safely and interactively learns a model of the user's reward function.

Car Racing

Scaling shared model governance via model splitting

no code implementations ICLR 2019 Miljan Martic, Jan Leike, Andrew Trask, Matteo Hessel, Shane Legg, Pushmeet Kohli

Currently the only techniques for sharing governance of a deep learning model are homomorphic encryption and secure multiparty computation.


Scalable agent alignment via reward modeling: a research direction

3 code implementations19 Nov 2018 Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg

One obstacle to applying reinforcement learning algorithms to real-world problems is the lack of suitable reward functions.

Atari Games reinforcement-learning

Learning to Understand Goal Specifications by Modelling Reward

1 code implementation ICLR 2019 Dzmitry Bahdanau, Felix Hill, Jan Leike, Edward Hughes, Arian Hosseini, Pushmeet Kohli, Edward Grefenstette

Recent work has shown that deep reinforcement-learning agents can learn to follow language-like instructions from infrequent environment rewards.

AI Safety Gridworlds

2 code implementations27 Nov 2017 Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, Shane Legg

We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents.

reinforcement-learning Safe Exploration

Deep reinforcement learning from human preferences

6 code implementations NeurIPS 2017 Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei

For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems.

Atari Games reinforcement-learning

Universal Reinforcement Learning Algorithms: Survey and Experiments

1 code implementation30 May 2017 John Aslanides, Jan Leike, Marcus Hutter

Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP).


Generalised Discount Functions applied to a Monte-Carlo AImu Implementation

1 code implementation3 Mar 2017 Sean Lamont, John Aslanides, Jan Leike, Marcus Hutter

We have added to the GRL simulation platform AIXIjs the functionality to assign an agent arbitrary discount functions, and an environment which can be used to determine the effect of discounting on an agent's policy.

General Reinforcement Learning reinforcement-learning

Nonparametric General Reinforcement Learning

no code implementations28 Nov 2016 Jan Leike

However, there are Bayesian approaches to general RL that satisfy objective optimality guarantees: We prove that Thompson sampling is asymptotically optimal in stochastic environments in the sense that its value converges to the value of the optimal policy.

General Reinforcement Learning reinforcement-learning

Exploration Potential

no code implementations16 Sep 2016 Jan Leike

We introduce exploration potential, a quantity that measures how much a reinforcement learning agent has explored its environment class.

Multi-Armed Bandits reinforcement-learning

A Formal Solution to the Grain of Truth Problem

no code implementations16 Sep 2016 Jan Leike, Jessica Taylor, Benya Fallenstein

In this paper we present a formal and general solution to the full grain of truth problem: we construct a class of policies that contains all computable policies as well as Bayes-optimal policies for every lower semicomputable prior over the class.

Loss Bounds and Time Complexity for Speed Priors

no code implementations12 Apr 2016 Daniel Filan, Marcus Hutter, Jan Leike

On a polynomial time computable sequence our speed prior is computable in exponential time.

Thompson Sampling is Asymptotically Optimal in General Environments

no code implementations25 Feb 2016 Jan Leike, Tor Lattimore, Laurent Orseau, Marcus Hutter

We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments.


On the Computability of AIXI

no code implementations19 Oct 2015 Jan Leike, Marcus Hutter

Solomonoff induction and the reinforcement learning agent AIXI are proposed answers to this question.


Bad Universal Priors and Notions of Optimality

no code implementations16 Oct 2015 Jan Leike, Marcus Hutter

A big open question of algorithmic information theory is the choice of the universal Turing machine (UTM).

Solomonoff Induction Violates Nicod's Criterion

no code implementations15 Jul 2015 Jan Leike, Marcus Hutter

Nicod's criterion states that observing a black raven is evidence for the hypothesis H that all ravens are black.

On the Computability of Solomonoff Induction and Knowledge-Seeking

no code implementations15 Jul 2015 Jan Leike, Marcus Hutter

Solomonoff induction is held as a gold standard for learning, but it is known to be incomputable.


Sequential Extensions of Causal and Evidential Decision Theory

no code implementations24 Jun 2015 Tom Everitt, Jan Leike, Marcus Hutter

Moving beyond the dualistic view in AI where agent and environment are separated incurs new challenges for decision making, as calculation of expected utility is no longer straightforward.

Decision Making

Indefinitely Oscillating Martingales

no code implementations14 Aug 2014 Jan Leike, Marcus Hutter

We construct a class of nonnegative martingale processes that oscillate indefinitely with high probability.

Cannot find the paper you are looking for? You can Submit a new open access paper.