no code implementations • 20 Jul 2023 • David Abel, André Barreto, Benjamin Van Roy, Doina Precup, Hado van Hasselt, Satinder Singh
In this paper we develop a foundation for continual reinforcement learning.
no code implementations • 20 Jul 2023 • David Abel, André Barreto, Hado van Hasselt, Benjamin Van Roy, Doina Precup, Satinder Singh
Standard models of the reinforcement learning problem give rise to a straightforward definition of convergence: An agent converges when its behavior or performance in each environment state stops changing.
no code implementations • 7 Mar 2023 • Simon Schmitt, John Shawe-Taylor, Hado van Hasselt
We propose epistemic value estimation (EVE): a recipe that is compatible with sequential decision making and with neural network function approximators.
no code implementations • 8 Feb 2023 • Chentian Jiang, Nan Rosemary Ke, Hado van Hasselt
To generalize across tasks, an agent should acquire knowledge from past tasks that facilitate adaptation and exploration in future tasks.
no code implementations • 9 Jan 2023 • Sebastian Flennerhag, Tom Zahavy, Brendan O'Donoghue, Hado van Hasselt, András György, Satinder Singh
We study the connection between gradient-based meta-learning and convex op-timisation.
no code implementations • 15 Sep 2022 • Steven Kapturowski, Víctor Campos, Ray Jiang, Nemanja Rakićević, Hado van Hasselt, Charles Blundell, Adrià Puigdomènech Badia
The task of building general agents that perform well over a wide range of tasks has been an importantgoal in reinforcement learning since its inception.
no code implementations • 20 Feb 2022 • Veronica Chelu, Diana Borsa, Doina Precup, Hado van Hasselt
Efficient credit assignment is essential for reinforcement learning algorithms in both prediction and control settings.
no code implementations • 17 Jan 2022 • Simon Schmitt, John Shawe-Taylor, Hado van Hasselt
To accumulate knowledge and improve its policy of behaviour, a reinforcement learning agent can learn `off-policy' about policies that differ from the policy used to generate its experience.
no code implementations • NeurIPS 2021 • Gregory Farquhar, Kate Baumli, Zita Marinho, Angelos Filos, Matteo Hessel, Hado van Hasselt, David Silver
Learned models of the environment provide reinforcement learning (RL) agents with flexible ways of making predictions about the environment.
no code implementations • 8 Oct 2021 • Marta Garnelo, Wojciech Marian Czarnecki, SiQi Liu, Dhruva Tirumala, Junhyuk Oh, Gauthier Gidel, Hado van Hasselt, David Balduzzi
Strategic diversity is often essential in games: in multi-player games, for example, evaluating a player against a diverse set of strategies will yield a more accurate estimate of its performance.
no code implementations • ICLR 2022 • David Silver, Anirudh Goyal, Ivo Danihelka, Matteo Hessel, Hado van Hasselt
How should state be constructed from a sequence of observations, so as to best achieve some objective?
no code implementations • 22 Sep 2021 • Louis Kirsch, Sebastian Flennerhag, Hado van Hasselt, Abram Friesen, Junhyuk Oh, Yutian Chen
We show that a recent successful meta RL approach that meta-learns an objective for backpropagation-based learning exhibits certain symmetries (specifically the reuse of the learning rule, and invariance to input and output permutations) that are not present in typical black-box meta RL systems.
1 code implementation • ICLR 2022 • Sebastian Flennerhag, Yannick Schroecker, Tom Zahavy, Hado van Hasselt, David Silver, Satinder Singh
We achieve a new state-of-the art for model-free agents on the Atari ALE benchmark and demonstrate that it yields both performance and efficiency gains in multi-task meta-learning.
no code implementations • 12 Jul 2021 • Ray Jiang, Shangtong Zhang, Veronica Chelu, Adam White, Hado van Hasselt
We develop a multi-step emphatic weighting that can be combined with replay, and a time-reversed $n$-step TD learning algorithm to learn the required emphatic weighting.
no code implementations • 21 Jun 2021 • Ray Jiang, Tom Zahavy, Zhongwen Xu, Adam White, Matteo Hessel, Charles Blundell, Hado van Hasselt
In this paper, we extend the use of emphatic methods to deep reinforcement learning agents.
3 code implementations • 13 Apr 2021 • Matteo Hessel, Manuel Kroiss, Aidan Clark, Iurii Kemaev, John Quan, Thomas Keck, Fabio Viola, Hado van Hasselt
Supporting state-of-the-art AI research requires balancing rapid prototyping, ease of use, and quick iteration, with the ability to deploy experiments at a scale traditionally associated with production systems. Deep learning frameworks such as TensorFlow, PyTorch and JAX allow users to transparently make use of accelerators, such as TPUs and GPUs, to offload the more computationally intensive parts of training and inference in modern deep learning systems.
2 code implementations • 13 Apr 2021 • Matteo Hessel, Ivo Danihelka, Fabio Viola, Arthur Guez, Simon Schmitt, Laurent SIfre, Theophane Weber, David Silver, Hado van Hasselt
We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss.
Ranked #8 on
Atari Games
on atari game
2 code implementations • 24 Feb 2021 • David Raposo, Sam Ritter, Adam Santoro, Greg Wayne, Theophane Weber, Matt Botvinick, Hado van Hasselt, Francis Song
We propose state-associative (SA) learning, where the agent learns associations between states and arbitrarily distant future rewards, then propagates credit directly between the two.
no code implementations • NeurIPS 2021 • Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado van Hasselt, David Silver, Satinder Singh
Temporal abstractions in the form of options have been shown to help reinforcement learning (RL) agents learn faster.
no code implementations • NeurIPS 2020 • Veronica Chelu, Doina Precup, Hado van Hasselt
We address the problem of credit assignment in reinforcement learning and explore fundamental questions regarding the way in which an agent can best use additional computation to propagate new information, by planning with internal models of the world to improve its predictions.
1 code implementation • NeurIPS 2020 • Junhyuk Oh, Matteo Hessel, Wojciech M. Czarnecki, Zhongwen Xu, Hado van Hasselt, Satinder Singh, David Silver
Automating the discovery of update rules from data could lead to more efficient algorithms, or algorithms that are better adapted to specific environments.
no code implementations • NeurIPS 2020 • Zhongwen Xu, Hado van Hasselt, Matteo Hessel, Junhyuk Oh, Satinder Singh, David Silver
In this work, we propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network, solely from interactive experience with its environment.
no code implementations • 3 Jul 2020 • Hado van Hasselt, Sephora Madjiheurem, Matteo Hessel, David Silver, André Barreto, Diana Borsa
The question of how to determine which states and actions are responsible for a certain outcome is known as the credit assignment problem and remains a central research question in reinforcement learning and artificial intelligence.
no code implementations • NeurIPS 2020 • Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh
Reinforcement learning algorithms are highly sensitive to the choice of hyperparameters, typically requiring significant manual effort to identify hyperparameters that perform well on a new domain.
no code implementations • ICML 2020 • Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss, Hado van Hasselt, David Silver, Satinder Singh
Furthermore, we show that unlike policy transfer methods that capture "how" the agent should behave, the learned reward functions can generalise to other kinds of agents and to changes in the dynamics of the environment by capturing "what" the agent should strive to do.
1 code implementation • NeurIPS 2019 • Anna Harutyunyan, Will Dabney, Thomas Mesnard, Mohammad Azar, Bilal Piot, Nicolas Heess, Hado van Hasselt, Greg Wayne, Satinder Singh, Doina Precup, Remi Munos
We consider the problem of efficient credit assignment in reinforcement learning.
no code implementations • 16 Oct 2019 • Mark Rowland, Anna Harutyunyan, Hado van Hasselt, Diana Borsa, Tom Schaul, Rémi Munos, Will Dabney
We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.
no code implementations • NeurIPS 2019 • Vivek Veeriah, Matteo Hessel, Zhongwen Xu, Richard Lewis, Janarthanan Rajendran, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh
Arguably, intelligent agents ought to be able to discover their own questions so that in learning answers for them they learn unanticipated useful knowledge and skills; this departs from the focus in much of machine learning on agents learning answers to externally defined questions.
2 code implementations • ICLR 2020 • Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado van Hasselt
bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement learning (RL) agents with two objectives.
no code implementations • 8 Jul 2019 • Hado van Hasselt, John Quan, Matteo Hessel, Zhongwen Xu, Diana Borsa, Andre Barreto
We consider a general class of non-linear Bellman equations.
no code implementations • 5 Jul 2019 • Matteo Hessel, Hado van Hasselt, Joseph Modayil, David Silver
These inductive biases can take many forms, including domain knowledge and pretuned hyper-parameters.
2 code implementations • NeurIPS 2019 • Hado van Hasselt, Matteo Hessel, John Aslanides
We examine the question of when and how parametric models are most useful in reinforcement learning.
no code implementations • 8 May 2019 • Pedro A. Ortega, Jane. X. Wang, Mark Rowland, Tim Genewein, Zeb Kurth-Nelson, Razvan Pascanu, Nicolas Heess, Joel Veness, Alex Pritzel, Pablo Sprechmann, Siddhant M. Jayakumar, Tom McGrath, Kevin Miller, Mohammad Azar, Ian Osband, Neil Rabinowitz, András György, Silvia Chiappa, Simon Osindero, Yee Whye Teh, Hado van Hasselt, Nando de Freitas, Matthew Botvinick, Shane Legg
In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class.
no code implementations • ICLR 2019 • Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Večerík, Matteo Hessel, Rémi Munos, Olivier Pietquin
Despite significant advances in the field of deep Reinforcement Learning (RL), today's algorithms still fail to learn human-level policies consistently over a set of diverse tasks such as Atari 2600 games.
1 code implementation • ICLR 2019 • Diana Borsa, André Barreto, John Quan, Daniel Mankowitz, Rémi Munos, Hado van Hasselt, David Silver, Tom Schaul
We focus on one aspect in particular, namely the ability to generalise to unseen tasks.
no code implementations • 6 Dec 2018 • Hado van Hasselt, Yotam Doron, Florian Strub, Matteo Hessel, Nicolas Sonnerat, Joseph Modayil
In this work, we investigate the impact of the deadly triad in practice, in the context of a family of popular deep reinforcement learning models - deep Q-networks trained with experience replay - analysing how the components of this system play a role in the emergence of the deadly triad, and in the agent's performance
no code implementations • 16 Nov 2018 • Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc Bellemare, Doina Precup
We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments.
2 code implementations • 12 Sep 2018 • Matteo Hessel, Hubert Soyer, Lasse Espeholt, Wojciech Czarnecki, Simon Schmitt, Hado van Hasselt
This means the learning algorithm is general, but each solution is not; each agent can only solve the one task it was trained on.
Ranked #1 on
Visual Navigation
on Dmlab-30
no code implementations • 29 May 2018 • Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Večerík, Matteo Hessel, Rémi Munos, Olivier Pietquin
Despite significant advances in the field of deep Reinforcement Learning (RL), today's algorithms still fail to learn human-level policies consistently over a set of diverse tasks such as Atari 2600 games.
no code implementations • NeurIPS 2018 • Zhongwen Xu, Hado van Hasselt, David Silver
Instead, the majority of reinforcement learning algorithms estimate and/or optimise a proxy for the value function.
15 code implementations • ICLR 2018 • Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado van Hasselt, David Silver
We propose a distributed architecture for deep reinforcement learning at scale, that enables agents to learn effectively from orders of magnitude more data than previously possible.
Ranked #1 on
Atari Games
on Atari 2600 Boxing
no code implementations • 22 Feb 2018 • Daniel J. Mankowitz, Augustin Žídek, André Barreto, Dan Horgan, Matteo Hessel, John Quan, Junhyuk Oh, Hado van Hasselt, David Silver, Tom Schaul
Some real-world domains are best characterized as a single task, but for others this perspective is limiting.
31 code implementations • 6 Oct 2017 • Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver
The deep reinforcement learning community has made several independent improvements to the DQN algorithm.
10 code implementations • 16 Aug 2017 • Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John Agapiou, Julian Schrittwieser, John Quan, Stephen Gaffney, Stig Petersen, Karen Simonyan, Tom Schaul, Hado van Hasselt, David Silver, Timothy Lillicrap, Kevin Calderone, Paul Keet, Anthony Brunasso, David Lawrence, Anders Ekermo, Jacob Repp, Rodney Tsing
Finally, we present initial baseline results for canonical deep reinforcement learning agents applied to the StarCraft II domain.
Ranked #1 on
Starcraft II
on MoveToBeacon
1 code implementation • ICML 2017 • David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris
One of the key challenges of artificial intelligence is to learn models that are effective in the context of planning.
no code implementations • NeurIPS 2017 • André Barreto, Will Dabney, Rémi Munos, Jonathan J. Hunt, Tom Schaul, Hado van Hasselt, David Silver
Transfer in reinforcement learning refers to the notion that generalization should occur not only within a task but also across tasks.
no code implementations • NeurIPS 2016 • Hado van Hasselt, Arthur Guez, Matteo Hessel, Volodymyr Mnih, David Silver
Most learning algorithms are not invariant to the scale of the function that is being approximated.
Ranked #12 on
Atari Games
on Atari 2600 Centipede
2 code implementations • 24 Dec 2015 • Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, Theophane Weber, Thomas Degris, Ben Coppin
Being able to reason in an environment with a large number of discrete actions is essential to bringing reinforcement learning to a larger class of problems.
73 code implementations • 20 Nov 2015 • Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas
In recent years there have been many successes of using deep representations in reinforcement learning.
Ranked #1 on
Atari Games
on Atari 2600 Pong
96 code implementations • 22 Sep 2015 • Hado van Hasselt, Arthur Guez, David Silver
The popular Q-learning algorithm is known to overestimate action values under certain conditions.
Ranked #10 on
Atari Games
on Atari 2600 Gopher
no code implementations • 19 Aug 2015 • Hado van Hasselt, Richard S. Sutton
If predictions are made at a high rate or span over a large amount of time, substantial computation can be required to store all relevant observations and to update all predictions when the outcome is finally observed.