no code implementations • 18 Jun 2023 • Veronica Chelu, Tom Zahavy, Arthur Guez, Doina Precup, Sebastian Flennerhag
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates.
no code implementations • 10 Jun 2022 • Peter C. Humphreys, Arthur Guez, Olivier Tieleman, Laurent SIfre, Théophane Weber, Timothy Lillicrap
Effective decision making involves flexibly relating past experiences and relevant contextual information to a novel situation.
1 code implementation • ICLR 2022 • Jongmin Lee, Cosmin Paduraru, Daniel J. Mankowitz, Nicolas Heess, Doina Precup, Kee-Eung Kim, Arthur Guez
We consider the offline constrained reinforcement learning (RL) problem, in which the agent aims to compute a policy that maximizes expected return while satisfying given cost constraints, learning only from a pre-collected dataset.
no code implementations • 17 Feb 2022 • Anirudh Goyal, Abram L. Friesen, Andrea Banino, Theophane Weber, Nan Rosemary Ke, Adria Puigdomenech Badia, Arthur Guez, Mehdi Mirza, Peter C. Humphreys, Ksenia Konyushkova, Laurent SIfre, Michal Valko, Simon Osindero, Timothy Lillicrap, Nicolas Heess, Charles Blundell
In this paper we explore an alternative paradigm in which we train a network to map a dataset of past experiences to optimal behavior.
2 code implementations • ICLR 2022 • Ivo Danihelka, Arthur Guez, Julian Schrittwieser, David Silver
AlphaZero is a powerful reinforcement learning algorithm based on approximate policy iteration and tree search.
2 code implementations • 13 Apr 2021 • Matteo Hessel, Ivo Danihelka, Fabio Viola, Arthur Guez, Simon Schmitt, Laurent SIfre, Theophane Weber, David Silver, Hado van Hasselt
We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss.
Ranked #8 on
Atari Games
on atari game
no code implementations • 18 Nov 2020 • Thomas Mesnard, Théophane Weber, Fabio Viola, Shantanu Thakoor, Alaa Saade, Anna Harutyunyan, Will Dabney, Tom Stepleton, Nicolas Heess, Arthur Guez, Éric Moulines, Marcus Hutter, Lars Buesing, Rémi Munos
Credit assignment in reinforcement learning is the problem of measuring an action's influence on future rewards.
no code implementations • ICLR 2021 • Jessica B. Hamrick, Abram L. Friesen, Feryal Behbahani, Arthur Guez, Fabio Viola, Sims Witherspoon, Thomas Anthony, Lars Buesing, Petar Veličković, Théophane Weber
These results indicate where and how to utilize planning in reinforcement learning settings, and highlight a number of open questions for future MBRL research.
Model-based Reinforcement Learning
reinforcement-learning
+1
no code implementations • 3 Oct 2020 • Peter Karkus, Mehdi Mirza, Arthur Guez, Andrew Jaegle, Timothy Lillicrap, Lars Buesing, Nicolas Heess, Theophane Weber
We explore whether integrated tasks like Mujoban can be solved by composing RL modules together in a sense-plan-act hierarchy, where modules have well-defined roles similarly to classic robot architectures.
1 code implementation • 11 Sep 2020 • Mehdi Mirza, Andrew Jaegle, Jonathan J. Hunt, Arthur Guez, Saran Tunyasuvunakool, Alistair Muldal, Théophane Weber, Peter Karkus, Sébastien Racanière, Lars Buesing, Timothy Lillicrap, Nicolas Heess
To encourage progress towards this goal we introduce a set of physically embedded planning problems and make them publicly available.
no code implementations • NeurIPS 2020 • Arthur Guez, Fabio Viola, Théophane Weber, Lars Buesing, Steven Kapturowski, Doina Precup, David Silver, Nicolas Heess
Value estimation is a critical component of the reinforcement learning (RL) paradigm.
18 code implementations • 19 Nov 2019 • Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent SIfre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver
When evaluated on Go, chess and shogi, without any knowledge of the game rules, MuZero matched the superhuman performance of the AlphaZero algorithm that was supplied with the game rules.
Ranked #1 on
Atari Games
on Atari 2600 Alien
no code implementations • 1 Oct 2019 • Shruti Mishra, Abbas Abdolmaleki, Arthur Guez, Piotr Trochim, Doina Precup
Invariances to translation, rotation and other spatial transformations are a hallmark of the laws of motion, and have widespread use in the natural sciences to reduce the dimensionality of systems of equations.
1 code implementation • ICLR 2019 • Arthur Guez, Mehdi Mirza, Karol Gregor, Rishabh Kabra, Sébastien Racanière, Théophane Weber, David Raposo, Adam Santoro, Laurent Orseau, Tom Eccles, Greg Wayne, David Silver, Timothy Lillicrap
The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity.
no code implementations • ICLR 2019 • Lars Buesing, Theophane Weber, Yori Zwols, Sebastien Racaniere, Arthur Guez, Jean-Baptiste Lespiau, Nicolas Heess
In contrast to off-policy algorithms based on Importance Sampling which re-weight data, CF-GPS leverages a model to explicitly consider alternative outcomes, allowing the algorithm to make better use of experience data.
2 code implementations • ICML 2018 • Arthur Guez, Théophane Weber, Ioannis Antonoglou, Karen Simonyan, Oriol Vinyals, Daan Wierstra, Rémi Munos, David Silver
They are most typically solved by tree search algorithms that simulate ahead into the future, evaluate future states, and back-up those evaluations to the root of a search tree.
58 code implementations • 5 Dec 2017 • David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent SIfre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis
The game of chess is the most widely-studied domain in the history of artificial intelligence.
Ranked #1 on
Game of Go
on ELO Ratings
2 code implementations • NeurIPS 2017 • Théophane Weber, Sébastien Racanière, David P. Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adria Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, Razvan Pascanu, Peter Battaglia, Demis Hassabis, David Silver, Daan Wierstra
We introduce Imagination-Augmented Agents (I2As), a novel architecture for deep reinforcement learning combining model-free and model-based aspects.
Model-based Reinforcement Learning
reinforcement-learning
+1
1 code implementation • ICML 2017 • David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris
One of the key challenges of artificial intelligence is to learn models that are effective in the context of planning.
no code implementations • NeurIPS 2016 • Hado van Hasselt, Arthur Guez, Matteo Hessel, Volodymyr Mnih, David Silver
Most learning algorithms are not invariant to the scale of the function that is being approximated.
Ranked #12 on
Atari Games
on Atari 2600 Centipede
2 code implementations • 15 Dec 2015 • Marc G. Bellemare, Georg Ostrovski, Arthur Guez, Philip S. Thomas, Rémi Munos
Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman operator.
Ranked #1 on
Atari Games
on Atari 2600 Elevator Action
97 code implementations • 22 Sep 2015 • Hado van Hasselt, Arthur Guez, David Silver
The popular Q-learning algorithm is known to overestimate action values under certain conditions.
Ranked #10 on
Atari Games
on Atari 2600 Gopher
no code implementations • NeurIPS 2014 • Arthur Guez, Nicolas Heess, David Silver, Peter Dayan
Bayes-adaptive planning offers a principled solution to the exploration-exploitation trade-off under model uncertainty.
no code implementations • 9 Feb 2014 • Arthur Guez, David Silver, Peter Dayan
The computational costs of inference and planning have confined Bayesian model-based reinforcement learning to one of two dismal fates: powerful Bayes-adaptive planning but only for simplistic models, or powerful, Bayesian non-parametric models but using simple, myopic planning strategies such as Thompson sampling.
Model-based Reinforcement Learning
Reinforcement Learning (RL)
+1
no code implementations • NeurIPS 2012 • Arthur Guez, David Silver, Peter Dayan
Bayesian model-based reinforcement learning is a formally elegant approach to learning optimal behaviour under model uncertainty, trading off exploration and exploitation in an ideal way.
Model-based Reinforcement Learning
reinforcement-learning
+1