no code implementations • 4 Jun 2018 • Victoria Krakovna, Laurent Orseau, Ramana Kumar, Miljan Martic, Shane Legg
How can we design safe reinforcement learning agents that avoid unnecessary disruptions to their environment?
no code implementations • 31 May 2018 • Laurent Orseau, Simon McGregor McGill, Shane Legg
According to Dennett, the same system may be described using a `physical' (mechanical) explanatory stance, or using an `intentional' (belief- and goal-based) explanatory stance.
no code implementations • 25 Feb 2016 • Jan Leike, Tor Lattimore, Laurent Orseau, Marcus Hutter
We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments.
no code implementations • 8 Jan 2019 • Laurent Orseau, Tor Lattimore, Shane Legg
We consider prediction with expert advice under the log-loss with the goal of deriving efficient and robust algorithms.
no code implementations • 7 Jun 2019 • Laurent Orseau, Levi H. S. Lelis, Tor Lattimore
Under mild assumptions we prove our algorithms are guaranteed to perform only a logarithmic factor more node expansions than A* when the search space is a tree.
no code implementations • 30 Jul 2019 • Malte Helmert, Tor Lattimore, Levi H. S. Lelis, Laurent Orseau, Nathan R. Sturtevant
For graph search, A* can require $\Omega(2^{n})$ expansions, where $n$ is the number of states within the final $f$ bound.
no code implementations • 28 Apr 2020 • Stuart Armstrong, Jan Leike, Laurent Orseau, Shane Legg
We formally introduce two desirable properties: the first is `unriggability', which prevents the agent from steering the learning process in the direction of a reward function that is easier to optimise.
no code implementations • 19 Jun 2020 • Eser Aygün, Zafarali Ahmed, Ankit Anand, Vlad Firoiu, Xavier Glorot, Laurent Orseau, Doina Precup, Shibl Mourad
A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in training successful deep learning models.
no code implementations • NeurIPS 2020 • Laurent Orseau, Marcus Hutter, Omar Rivasplata
The Lottery Ticket Hypothesis is a conjecture that every large neural network contains a subnetwork that, when trained in isolation, achieves comparable performance to the large network.
no code implementations • NeurIPS 2020 • Victoria Krakovna, Laurent Orseau, Richard Ngo, Miljan Martic, Shane Legg
To avoid this interference incentive, we introduce a baseline policy that represents a default course of action (such as doing nothing), and use it to filter out future tasks that are not achievable by default.
no code implementations • 5 Mar 2021 • Vlad Firoiu, Eser Aygun, Ankit Anand, Zafarali Ahmed, Xavier Glorot, Laurent Orseau, Lei Zhang, Doina Precup, Shibl Mourad
A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in training successful deep learning models.
no code implementations • 20 Dec 2021 • Eser Aygün, Laurent Orseau, Ankit Anand, Xavier Glorot, Vlad Firoiu, Lei M. Zhang, Doina Precup, Shibl Mourad
Traditional automated theorem provers for first-order logic depend on speed-optimized search and many handcrafted heuristics that are designed to work best over a wide range of domains.
no code implementations • 29 Dec 2021 • Laurent Orseau, Marcus Hutter
We extend and combine several tools of the literature to design fast, adaptive, anytime and scale-free online learning algorithms.
no code implementations • 31 Jul 2023 • Laurent Orseau, Marcus Hutter
However, to the best of our knowledge, there is no principled exact line search algorithm for general convex functions -- including piecewise-linear and max-compositions of convex functions -- that takes advantage of convexity.
no code implementations • 6 Nov 2023 • Abbas Mehrabian, Ankit Anand, Hyunjik Kim, Nicolas Sonnerat, Matej Balog, Gheorghe Comanici, Tudor Berariu, Andrew Lee, Anian Ruoss, Anna Bulanova, Daniel Toyama, Sam Blackwell, Bernardino Romera Paredes, Petar Veličković, Laurent Orseau, Joonkyung Lee, Anurag Murty Naredla, Doina Precup, Adam Zsolt Wagner
This work studies a central extremal graph theory problem inspired by a 1975 conjecture of Erd\H{o}s, which aims to find graphs with a given size (number of nodes) that maximize the number of edges without having 3- or 4-cycles.
1 code implementation • 21 Mar 2021 • Laurent Orseau, Levi H. S. Lelis
LevinTS is guided by a policy and provides guarantees on the number of search steps that relate to the quality of the policy, but it does not make use of a heuristic function.
1 code implementation • 6 Feb 2023 • Tim Genewein, Grégoire Delétang, Anian Ruoss, Li Kevin Wenliang, Elliot Catt, Vincent Dutordoir, Jordi Grau-Moya, Laurent Orseau, Marcus Hutter, Joel Veness
Memory-based meta-learning is a technique for approximating Bayes-optimal predictors.
4 code implementations • 28 May 2021 • Lauro Langosco, Jack Koch, Lee Sharkey, Jacob Pfau, Laurent Orseau, David Krueger
We study goal misgeneralization, a type of out-of-distribution generalization failure in reinforcement learning (RL).
1 code implementation • 26 May 2023 • Laurent Orseau, Marcus Hutter, Levi H. S. Lelis
Levin Tree Search (LTS) is a search algorithm that makes use of a policy (a probability distribution over actions) and comes with a theoretical guarantee on the number of expansions before reaching a goal node, depending on the quality of the policy.
1 code implementation • 23 May 2017 • Tom Everitt, Victoria Krakovna, Laurent Orseau, Marcus Hutter, Shane Legg
Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards.
1 code implementation • 19 Sep 2023 • Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, Joel Veness
We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning.
1 code implementation • 26 Jan 2024 • Jordi Grau-Moya, Tim Genewein, Marcus Hutter, Laurent Orseau, Grégoire Delétang, Elliot Catt, Anian Ruoss, Li Kevin Wenliang, Christopher Mattern, Matthew Aitchison, Joel Veness
Meta-learning has emerged as a powerful approach to train neural networks to learn new tasks quickly from limited data.
1 code implementation • NeurIPS 2018 • Laurent Orseau, Levi H. S. Lelis, Tor Lattimore, Théophane Weber
We introduce two novel tree search algorithms that use a policy to guide search.
1 code implementation • ICLR 2019 • Arthur Guez, Mehdi Mirza, Karol Gregor, Rishabh Kabra, Sébastien Racanière, Théophane Weber, David Raposo, Adam Santoro, Laurent Orseau, Tom Eccles, Greg Wayne, David Silver, Timothy Lillicrap
The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity.
2 code implementations • 27 Nov 2017 • Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, Shane Legg
We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents.