We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates.
Effective decision making involves flexibly relating past experiences and relevant contextual information to a novel situation.
We consider the offline constrained reinforcement learning (RL) problem, in which the agent aims to compute a policy that maximizes expected return while satisfying given cost constraints, learning only from a pre-collected dataset.
no code implementations • 17 Feb 2022 • Anirudh Goyal, Abram L. Friesen, Andrea Banino, Theophane Weber, Nan Rosemary Ke, Adria Puigdomenech Badia, Arthur Guez, Mehdi Mirza, Peter C. Humphreys, Ksenia Konyushkova, Laurent SIfre, Michal Valko, Simon Osindero, Timothy Lillicrap, Nicolas Heess, Charles Blundell
In this paper we explore an alternative paradigm in which we train a network to map a dataset of past experiences to optimal behavior.
AlphaZero is a powerful reinforcement learning algorithm based on approximate policy iteration and tree search.
We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss.
Ranked #8 on Atari Games on atari game
no code implementations • 18 Nov 2020 • Thomas Mesnard, Théophane Weber, Fabio Viola, Shantanu Thakoor, Alaa Saade, Anna Harutyunyan, Will Dabney, Tom Stepleton, Nicolas Heess, Arthur Guez, Éric Moulines, Marcus Hutter, Lars Buesing, Rémi Munos
Credit assignment in reinforcement learning is the problem of measuring an action's influence on future rewards.
These results indicate where and how to utilize planning in reinforcement learning settings, and highlight a number of open questions for future MBRL research.
We explore whether integrated tasks like Mujoban can be solved by composing RL modules together in a sense-plan-act hierarchy, where modules have well-defined roles similarly to classic robot architectures.
1 code implementation • 11 Sep 2020 • Mehdi Mirza, Andrew Jaegle, Jonathan J. Hunt, Arthur Guez, Saran Tunyasuvunakool, Alistair Muldal, Théophane Weber, Peter Karkus, Sébastien Racanière, Lars Buesing, Timothy Lillicrap, Nicolas Heess
To encourage progress towards this goal we introduce a set of physically embedded planning problems and make them publicly available.
Value estimation is a critical component of the reinforcement learning (RL) paradigm.
18 code implementations • 19 Nov 2019 • Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent SIfre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver
When evaluated on Go, chess and shogi, without any knowledge of the game rules, MuZero matched the superhuman performance of the AlphaZero algorithm that was supplied with the game rules.
Ranked #1 on Atari Games on Atari 2600 Alien
Invariances to translation, rotation and other spatial transformations are a hallmark of the laws of motion, and have widespread use in the natural sciences to reduce the dimensionality of systems of equations.
1 code implementation • • Arthur Guez, Mehdi Mirza, Karol Gregor, Rishabh Kabra, Sébastien Racanière, Théophane Weber, David Raposo, Adam Santoro, Laurent Orseau, Tom Eccles, Greg Wayne, David Silver, Timothy Lillicrap
The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity.
In contrast to off-policy algorithms based on Importance Sampling which re-weight data, CF-GPS leverages a model to explicitly consider alternative outcomes, allowing the algorithm to make better use of experience data.
They are most typically solved by tree search algorithms that simulate ahead into the future, evaluate future states, and back-up those evaluations to the root of a search tree.
58 code implementations • 5 Dec 2017 • David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent SIfre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis
The game of chess is the most widely-studied domain in the history of artificial intelligence.
Ranked #1 on Game of Go on ELO Ratings
2 code implementations • • Théophane Weber, Sébastien Racanière, David P. Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adria Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, Razvan Pascanu, Peter Battaglia, Demis Hassabis, David Silver, Daan Wierstra
We introduce Imagination-Augmented Agents (I2As), a novel architecture for deep reinforcement learning combining model-free and model-based aspects.
One of the key challenges of artificial intelligence is to learn models that are effective in the context of planning.
Most learning algorithms are not invariant to the scale of the function that is being approximated.
Ranked #12 on Atari Games on Atari 2600 Centipede
Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman operator.
Ranked #1 on Atari Games on Atari 2600 Elevator Action
The popular Q-learning algorithm is known to overestimate action values under certain conditions.
Ranked #10 on Atari Games on Atari 2600 Gopher
The computational costs of inference and planning have confined Bayesian model-based reinforcement learning to one of two dismal fates: powerful Bayes-adaptive planning but only for simplistic models, or powerful, Bayesian non-parametric models but using simple, myopic planning strategies such as Thompson sampling.
Bayesian model-based reinforcement learning is a formally elegant approach to learning optimal behaviour under model uncertainty, trading off exploration and exploitation in an ideal way.