Search Results for author: Laurent Orseau

Found 25 papers, 10 papers with code

Penalizing side effects using stepwise relative reachability

no code implementations • 4 Jun 2018 • Victoria Krakovna, Laurent Orseau, Ramana Kumar, Miljan Martic, Shane Legg

How can we design safe reinforcement learning agents that avoid unnecessary disruptions to their environment?

Paper
Add Code

Agents and Devices: A Relative Definition of Agency

no code implementations • 31 May 2018 • Laurent Orseau, Simon McGregor McGill, Shane Legg

According to Dennett, the same system may be described using a `physical' (mechanical) explanatory stance, or using an `intentional' (belief- and goal-based) explanatory stance.

Paper
Add Code

Thompson Sampling is Asymptotically Optimal in General Environments

no code implementations • 25 Feb 2016 • Jan Leike, Tor Lattimore, Laurent Orseau, Marcus Hutter

We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Soft-Bayes: Prod for Mixtures of Experts with Log-Loss

no code implementations • 8 Jan 2019 • Laurent Orseau, Tor Lattimore, Shane Legg

We consider prediction with expert advice under the log-loss with the goal of deriving efficient and robust algorithms.

Paper
Add Code

Zooming Cautiously: Linear-Memory Heuristic Search With Node Expansion Guarantees

no code implementations • 7 Jun 2019 • Laurent Orseau, Levi H. S. Lelis, Tor Lattimore

Under mild assumptions we prove our algorithms are guaranteed to perform only a logarithmic factor more node expansions than A* when the search space is a tree.

Paper
Add Code

Iterative Budgeted Exponential Search

no code implementations • 30 Jul 2019 • Malte Helmert, Tor Lattimore, Levi H. S. Lelis, Laurent Orseau, Nathan R. Sturtevant

For graph search, A* can require $\Omega(2^{n})$ expansions, where $n$ is the number of states within the final $f$ bound.

Paper
Add Code

Pitfalls of learning a reward function online

no code implementations • 28 Apr 2020 • Stuart Armstrong, Jan Leike, Laurent Orseau, Shane Legg

We formally introduce two desirable properties: the first is `unriggability', which prevents the agent from steering the learning process in the direction of a reward function that is easier to optimise.

Paper
Add Code

Learning to Prove from Synthetic Theorems

no code implementations • 19 Jun 2020 • Eser Aygün, Zafarali Ahmed, Ankit Anand, Vlad Firoiu, Xavier Glorot, Laurent Orseau, Doina Precup, Shibl Mourad

A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in training successful deep learning models.

Automated Theorem Proving

Paper
Add Code

Logarithmic Pruning is All You Need

no code implementations • NeurIPS 2020 • Laurent Orseau, Marcus Hutter, Omar Rivasplata

The Lottery Ticket Hypothesis is a conjecture that every large neural network contains a subnetwork that, when trained in isolation, achieves comparable performance to the large network.

Paper
Add Code

Avoiding Side Effects By Considering Future Tasks

no code implementations • NeurIPS 2020 • Victoria Krakovna, Laurent Orseau, Richard Ngo, Miljan Martic, Shane Legg

To avoid this interference incentive, we introduce a baseline policy that represents a default course of action (such as doing nothing), and use it to filter out future tasks that are not achievable by default.

Paper
Add Code

Training a First-Order Theorem Prover from Synthetic Data

no code implementations • 5 Mar 2021 • Vlad Firoiu, Eser Aygun, Ankit Anand, Zafarali Ahmed, Xavier Glorot, Laurent Orseau, Lei Zhang, Doina Precup, Shibl Mourad

A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in training successful deep learning models.

Automated Theorem Proving BIG-bench Machine Learning

Paper
Add Code

Proving Theorems using Incremental Learning and Hindsight Experience Replay

no code implementations • 20 Dec 2021 • Eser Aygün, Laurent Orseau, Ankit Anand, Xavier Glorot, Vlad Firoiu, Lei M. Zhang, Doina Precup, Shibl Mourad

Traditional automated theorem provers for first-order logic depend on speed-optimized search and many handcrafted heuristics that are designed to work best over a wide range of domains.

Automated Theorem Proving Incremental Learning

Paper
Add Code

Isotuning With Applications To Scale-Free Online Learning

no code implementations • 29 Dec 2021 • Laurent Orseau, Marcus Hutter

We extend and combine several tools of the literature to design fast, adaptive, anytime and scale-free online learning algorithms.

Paper
Add Code

Line Search for Convex Minimization

no code implementations • 31 Jul 2023 • Laurent Orseau, Marcus Hutter

However, to the best of our knowledge, there is no principled exact line search algorithm for general convex functions -- including piecewise-linear and max-compositions of convex functions -- that takes advantage of convexity.

Paper
Add Code

Finding Increasingly Large Extremal Graphs with AlphaZero and Tabu Search

no code implementations • 6 Nov 2023 • Abbas Mehrabian, Ankit Anand, Hyunjik Kim, Nicolas Sonnerat, Matej Balog, Gheorghe Comanici, Tudor Berariu, Andrew Lee, Anian Ruoss, Anna Bulanova, Daniel Toyama, Sam Blackwell, Bernardino Romera Paredes, Petar Veličković, Laurent Orseau, Joonkyung Lee, Anurag Murty Naredla, Doina Precup, Adam Zsolt Wagner

This work studies a central extremal graph theory problem inspired by a 1975 conjecture of Erd\H{o}s, which aims to find graphs with a given size (number of nodes) that maximize the number of edges without having 3- or 4-cycles.

Decision Making Graph Generation

Paper
Add Code

Policy-Guided Heuristic Search with Guarantees

1 code implementation • 21 Mar 2021 • Laurent Orseau, Levi H. S. Lelis

LevinTS is guided by a policy and provides guarantees on the number of search steps that relate to the quality of the policy, but it does not make use of a heuristic function.

Paper
Code

Memory-Based Meta-Learning on Non-Stationary Distributions

1 code implementation • 6 Feb 2023 • Tim Genewein, Grégoire Delétang, Anian Ruoss, Li Kevin Wenliang, Elliot Catt, Vincent Dutordoir, Jordi Grau-Moya, Laurent Orseau, Marcus Hutter, Joel Veness

Memory-based meta-learning is a technique for approximating Bayes-optimal predictors.

Bayesian Inference Meta-Learning

Paper
Code

Goal Misgeneralization in Deep Reinforcement Learning

4 code implementations • 28 May 2021 • Lauro Langosco, Jack Koch, Lee Sharkey, Jacob Pfau, Laurent Orseau, David Krueger

We study goal misgeneralization, a type of out-of-distribution generalization failure in reinforcement learning (RL).

Navigate Out-of-Distribution Generalization +2

Paper
Code

Levin Tree Search with Context Models

1 code implementation • 26 May 2023 • Laurent Orseau, Marcus Hutter, Levi H. S. Lelis

Levin Tree Search (LTS) is a search algorithm that makes use of a policy (a probability distribution over actions) and comes with a theoretical guarantee on the number of expansions before reaching a goal node, depending on the quality of the policy.

Rubik's Cube

Paper
Code

Reinforcement Learning with a Corrupted Reward Channel

1 code implementation • 23 May 2017 • Tom Everitt, Victoria Krakovna, Laurent Orseau, Marcus Hutter, Shane Legg

Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Language Modeling Is Compression

1 code implementation • 19 Sep 2023 • Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, Joel Veness

We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning.

In-Context Learning Language Modelling

Paper
Code

Learning Universal Predictors

1 code implementation • 26 Jan 2024 • Jordi Grau-Moya, Tim Genewein, Marcus Hutter, Laurent Orseau, Grégoire Delétang, Elliot Catt, Anian Ruoss, Li Kevin Wenliang, Christopher Mattern, Matthew Aitchison, Joel Veness

Meta-learning has emerged as a powerful approach to train neural networks to learn new tasks quickly from limited data.

Meta-Learning

Paper
Code

Single-Agent Policy Tree Search With Guarantees

1 code implementation • NeurIPS 2018 • Laurent Orseau, Levi H. S. Lelis, Tor Lattimore, Théophane Weber

We introduce two novel tree search algorithms that use a policy to guide search.

Paper
Code

An investigation of model-free planning

1 code implementation • ICLR 2019 • Arthur Guez, Mehdi Mirza, Karol Gregor, Rishabh Kabra, Sébastien Racanière, Théophane Weber, David Raposo, Adam Santoro, Laurent Orseau, Tom Eccles, Greg Wayne, David Silver, Timothy Lillicrap

The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity.

Inductive Bias Reinforcement Learning (RL)

Paper
Code

AI Safety Gridworlds

2 code implementations • 27 Nov 2017 • Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, Shane Legg

We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents.

reinforcement-learning Reinforcement Learning (RL) +1

595

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.