Search Results for author: Laurent Orseau

Found 26 papers, 10 papers with code

Super-Exponential Regret for UCT, AlphaGo and Variants

no code implementations7 May 2024 Laurent Orseau, Remi Munos

We improve the proofs of the lower bounds of Coquelin and Munos (2007) that demonstrate that UCT can have $\exp(\dots\exp(1)\dots)$ regret (with $\Omega(D)$ exp terms) on the $D$-chain environment, and that a `polynomial' UCT variant has $\exp_2(\exp_2(D - O(\log D)))$ regret on the same environment -- the original proofs contain an oversight for rewards bounded in $[0, 1]$, which we fix in the present draft.

Learning Universal Predictors

1 code implementation26 Jan 2024 Jordi Grau-Moya, Tim Genewein, Marcus Hutter, Laurent Orseau, Grégoire Delétang, Elliot Catt, Anian Ruoss, Li Kevin Wenliang, Christopher Mattern, Matthew Aitchison, Joel Veness

Meta-learning has emerged as a powerful approach to train neural networks to learn new tasks quickly from limited data.


Finding Increasingly Large Extremal Graphs with AlphaZero and Tabu Search

no code implementations6 Nov 2023 Abbas Mehrabian, Ankit Anand, Hyunjik Kim, Nicolas Sonnerat, Matej Balog, Gheorghe Comanici, Tudor Berariu, Andrew Lee, Anian Ruoss, Anna Bulanova, Daniel Toyama, Sam Blackwell, Bernardino Romera Paredes, Petar Veličković, Laurent Orseau, Joonkyung Lee, Anurag Murty Naredla, Doina Precup, Adam Zsolt Wagner

This work studies a central extremal graph theory problem inspired by a 1975 conjecture of Erd\H{o}s, which aims to find graphs with a given size (number of nodes) that maximize the number of edges without having 3- or 4-cycles.

Decision Making Graph Generation

Language Modeling Is Compression

1 code implementation19 Sep 2023 Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, Joel Veness

We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning.

In-Context Learning Language Modelling

Line Search for Convex Minimization

no code implementations31 Jul 2023 Laurent Orseau, Marcus Hutter

However, to the best of our knowledge, there is no principled exact line search algorithm for general convex functions -- including piecewise-linear and max-compositions of convex functions -- that takes advantage of convexity.

Levin Tree Search with Context Models

1 code implementation26 May 2023 Laurent Orseau, Marcus Hutter, Levi H. S. Lelis

Levin Tree Search (LTS) is a search algorithm that makes use of a policy (a probability distribution over actions) and comes with a theoretical guarantee on the number of expansions before reaching a goal node, depending on the quality of the policy.

Rubik's Cube

Isotuning With Applications To Scale-Free Online Learning

no code implementations29 Dec 2021 Laurent Orseau, Marcus Hutter

We extend and combine several tools of the literature to design fast, adaptive, anytime and scale-free online learning algorithms.

Proving Theorems using Incremental Learning and Hindsight Experience Replay

no code implementations20 Dec 2021 Eser Aygün, Laurent Orseau, Ankit Anand, Xavier Glorot, Vlad Firoiu, Lei M. Zhang, Doina Precup, Shibl Mourad

Traditional automated theorem provers for first-order logic depend on speed-optimized search and many handcrafted heuristics that are designed to work best over a wide range of domains.

Automated Theorem Proving Incremental Learning

Goal Misgeneralization in Deep Reinforcement Learning

4 code implementations28 May 2021 Lauro Langosco, Jack Koch, Lee Sharkey, Jacob Pfau, Laurent Orseau, David Krueger

We study goal misgeneralization, a type of out-of-distribution generalization failure in reinforcement learning (RL).

Navigate Out-of-Distribution Generalization +2

Policy-Guided Heuristic Search with Guarantees

1 code implementation21 Mar 2021 Laurent Orseau, Levi H. S. Lelis

LevinTS is guided by a policy and provides guarantees on the number of search steps that relate to the quality of the policy, but it does not make use of a heuristic function.

Training a First-Order Theorem Prover from Synthetic Data

no code implementations5 Mar 2021 Vlad Firoiu, Eser Aygun, Ankit Anand, Zafarali Ahmed, Xavier Glorot, Laurent Orseau, Lei Zhang, Doina Precup, Shibl Mourad

A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in training successful deep learning models.

Automated Theorem Proving BIG-bench Machine Learning

Avoiding Side Effects By Considering Future Tasks

no code implementations NeurIPS 2020 Victoria Krakovna, Laurent Orseau, Richard Ngo, Miljan Martic, Shane Legg

To avoid this interference incentive, we introduce a baseline policy that represents a default course of action (such as doing nothing), and use it to filter out future tasks that are not achievable by default.

Logarithmic Pruning is All You Need

no code implementations NeurIPS 2020 Laurent Orseau, Marcus Hutter, Omar Rivasplata

The Lottery Ticket Hypothesis is a conjecture that every large neural network contains a subnetwork that, when trained in isolation, achieves comparable performance to the large network.

Learning to Prove from Synthetic Theorems

no code implementations19 Jun 2020 Eser Aygün, Zafarali Ahmed, Ankit Anand, Vlad Firoiu, Xavier Glorot, Laurent Orseau, Doina Precup, Shibl Mourad

A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in training successful deep learning models.

Automated Theorem Proving

Pitfalls of learning a reward function online

no code implementations28 Apr 2020 Stuart Armstrong, Jan Leike, Laurent Orseau, Shane Legg

We formally introduce two desirable properties: the first is `unriggability', which prevents the agent from steering the learning process in the direction of a reward function that is easier to optimise.

Iterative Budgeted Exponential Search

no code implementations30 Jul 2019 Malte Helmert, Tor Lattimore, Levi H. S. Lelis, Laurent Orseau, Nathan R. Sturtevant

For graph search, A* can require $\Omega(2^{n})$ expansions, where $n$ is the number of states within the final $f$ bound.

Zooming Cautiously: Linear-Memory Heuristic Search With Node Expansion Guarantees

no code implementations7 Jun 2019 Laurent Orseau, Levi H. S. Lelis, Tor Lattimore

Under mild assumptions we prove our algorithms are guaranteed to perform only a logarithmic factor more node expansions than A* when the search space is a tree.

Soft-Bayes: Prod for Mixtures of Experts with Log-Loss

no code implementations8 Jan 2019 Laurent Orseau, Tor Lattimore, Shane Legg

We consider prediction with expert advice under the log-loss with the goal of deriving efficient and robust algorithms.

Penalizing side effects using stepwise relative reachability

no code implementations4 Jun 2018 Victoria Krakovna, Laurent Orseau, Ramana Kumar, Miljan Martic, Shane Legg

How can we design safe reinforcement learning agents that avoid unnecessary disruptions to their environment?

Safe Reinforcement Learning

Agents and Devices: A Relative Definition of Agency

no code implementations31 May 2018 Laurent Orseau, Simon McGregor McGill, Shane Legg

According to Dennett, the same system may be described using a `physical' (mechanical) explanatory stance, or using an `intentional' (belief- and goal-based) explanatory stance.

AI Safety Gridworlds

2 code implementations27 Nov 2017 Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, Shane Legg

We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents.

reinforcement-learning Reinforcement Learning (RL) +1

Reinforcement Learning with a Corrupted Reward Channel

1 code implementation23 May 2017 Tom Everitt, Victoria Krakovna, Laurent Orseau, Marcus Hutter, Shane Legg

Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards.

reinforcement-learning Reinforcement Learning (RL)

Thompson Sampling is Asymptotically Optimal in General Environments

no code implementations25 Feb 2016 Jan Leike, Tor Lattimore, Laurent Orseau, Marcus Hutter

We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments.

reinforcement-learning Reinforcement Learning (RL) +1

Cannot find the paper you are looking for? You can Submit a new open access paper.