no code implementations • NeurIPS 2021 • Gregory Farquhar, Kate Baumli, Zita Marinho, Angelos Filos, Matteo Hessel, Hado van Hasselt, David Silver
Learned models of the environment provide reinforcement learning (RL) agents with flexible ways of making predictions about the environment.
no code implementations • ICLR 2022 • David Silver, Anirudh Goyal, Ivo Danihelka, Matteo Hessel, Hado van Hasselt
How should state be constructed from a sequence of observations, so as to best achieve some objective?
no code implementations • 21 Jun 2021 • Ray Jiang, Tom Zahavy, Zhongwen Xu, Adam White, Matteo Hessel, Charles Blundell, Hado van Hasselt
In this paper, we extend the use of emphatic methods to deep reinforcement learning agents.
2 code implementations • 13 Apr 2021 • Matteo Hessel, Manuel Kroiss, Aidan Clark, Iurii Kemaev, John Quan, Thomas Keck, Fabio Viola, Hado van Hasselt
Supporting state-of-the-art AI research requires balancing rapid prototyping, ease of use, and quick iteration, with the ability to deploy experiments at a scale traditionally associated with production systems. Deep learning frameworks such as TensorFlow, PyTorch and JAX allow users to transparently make use of accelerators, such as TPUs and GPUs, to offload the more computationally intensive parts of training and inference in modern deep learning systems.
2 code implementations • 13 Apr 2021 • Matteo Hessel, Ivo Danihelka, Fabio Viola, Arthur Guez, Simon Schmitt, Laurent SIfre, Theophane Weber, David Silver, Hado van Hasselt
We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss.
Ranked #8 on
Atari Games
on atari game
no code implementations • NeurIPS 2021 • Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado van Hasselt, David Silver, Satinder Singh
Temporal abstractions in the form of options have been shown to help reinforcement learning (RL) agents learn faster.
1 code implementation • NeurIPS 2020 • Junhyuk Oh, Matteo Hessel, Wojciech M. Czarnecki, Zhongwen Xu, Hado van Hasselt, Satinder Singh, David Silver
Automating the discovery of update rules from data could lead to more efficient algorithms, or algorithms that are better adapted to specific environments.
no code implementations • NeurIPS 2020 • Zhongwen Xu, Hado van Hasselt, Matteo Hessel, Junhyuk Oh, Satinder Singh, David Silver
In this work, we propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network, solely from interactive experience with its environment.
no code implementations • 3 Jul 2020 • Hado van Hasselt, Sephora Madjiheurem, Matteo Hessel, David Silver, André Barreto, Diana Borsa
The question of how to determine which states and actions are responsible for a certain outcome is known as the credit assignment problem and remains a central research question in reinforcement learning and artificial intelligence.
no code implementations • NeurIPS 2020 • Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh
Reinforcement learning algorithms are highly sensitive to the choice of hyperparameters, typically requiring significant manual effort to identify hyperparameters that perform well on a new domain.
no code implementations • ICML 2020 • Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss, Hado van Hasselt, David Silver, Satinder Singh
Furthermore, we show that unlike policy transfer methods that capture "how" the agent should behave, the learned reward functions can generalise to other kinds of agents and to changes in the dynamics of the environment by capturing "what" the agent should strive to do.
no code implementations • ICML 2020 • Simon Schmitt, Matteo Hessel, Karen Simonyan
We investigate the combination of actor-critic reinforcement learning algorithms with uniform large-scale experience replay and propose solutions for two challenges: (a) efficient actor-critic learning with experience replay (b) stability of off-policy learning where agents learn from other agents behaviour.
Ranked #5 on
Atari Games
on Atari-57
no code implementations • NeurIPS 2019 • Vivek Veeriah, Matteo Hessel, Zhongwen Xu, Richard Lewis, Janarthanan Rajendran, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh
Arguably, intelligent agents ought to be able to discover their own questions so that in learning answers for them they learn unanticipated useful knowledge and skills; this departs from the focus in much of machine learning on agents learning answers to externally defined questions.
2 code implementations • ICLR 2020 • Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado van Hasselt
bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement learning (RL) agents with two objectives.
no code implementations • 8 Jul 2019 • Hado van Hasselt, John Quan, Matteo Hessel, Zhongwen Xu, Diana Borsa, Andre Barreto
We consider a general class of non-linear Bellman equations.
no code implementations • 5 Jul 2019 • Matteo Hessel, Hado van Hasselt, Joseph Modayil, David Silver
These inductive biases can take many forms, including domain knowledge and pretuned hyper-parameters.
2 code implementations • NeurIPS 2019 • Hado van Hasselt, Matteo Hessel, John Aslanides
We examine the question of when and how parametric models are most useful in reinforcement learning.
no code implementations • ICLR 2019 • Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Večerík, Matteo Hessel, Rémi Munos, Olivier Pietquin
Despite significant advances in the field of deep Reinforcement Learning (RL), today's algorithms still fail to learn human-level policies consistently over a set of diverse tasks such as Atari 2600 games.
no code implementations • ICML 2018 • André Barreto, Diana Borsa, John Quan, Tom Schaul, David Silver, Matteo Hessel, Daniel Mankowitz, Augustin Žídek, Rémi Munos
In this paper we extend the SFs & GPI framework in two ways.
no code implementations • ICLR 2019 • Miljan Martic, Jan Leike, Andrew Trask, Matteo Hessel, Shane Legg, Pushmeet Kohli
Currently the only techniques for sharing governance of a deep learning model are homomorphic encryption and secure multiparty computation.
no code implementations • 6 Dec 2018 • Hado van Hasselt, Yotam Doron, Florian Strub, Matteo Hessel, Nicolas Sonnerat, Joseph Modayil
In this work, we investigate the impact of the deadly triad in practice, in the context of a family of popular deep reinforcement learning models - deep Q-networks trained with experience replay - analysing how the components of this system play a role in the emergence of the deadly triad, and in the agent's performance
2 code implementations • 12 Sep 2018 • Matteo Hessel, Hubert Soyer, Lasse Espeholt, Wojciech Czarnecki, Simon Schmitt, Hado van Hasselt
This means the learning algorithm is general, but each solution is not; each agent can only solve the one task it was trained on.
Ranked #1 on
Visual Navigation
on Dmlab-30
no code implementations • 29 May 2018 • Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Večerík, Matteo Hessel, Rémi Munos, Olivier Pietquin
Despite significant advances in the field of deep Reinforcement Learning (RL), today's algorithms still fail to learn human-level policies consistently over a set of diverse tasks such as Atari 2600 games.
15 code implementations • ICLR 2018 • Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado van Hasselt, David Silver
We propose a distributed architecture for deep reinforcement learning at scale, that enables agents to learn effectively from orders of magnitude more data than previously possible.
Ranked #1 on
Atari Games
on Atari 2600 Boxing
no code implementations • 22 Feb 2018 • Daniel J. Mankowitz, Augustin Žídek, André Barreto, Dan Horgan, Matteo Hessel, John Quan, Junhyuk Oh, Hado van Hasselt, David Silver, Tom Schaul
Some real-world domains are best characterized as a single task, but for others this perspective is limiting.
30 code implementations • 6 Oct 2017 • Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver
The deep reinforcement learning community has made several independent improvements to the DQN algorithm.
1 code implementation • ICML 2017 • David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris
One of the key challenges of artificial intelligence is to learn models that are effective in the context of planning.
no code implementations • NeurIPS 2016 • Hado van Hasselt, Arthur Guez, Matteo Hessel, Volodymyr Mnih, David Silver
Most learning algorithms are not invariant to the scale of the function that is being approximated.
Ranked #12 on
Atari Games
on Atari 2600 Centipede
71 code implementations • 20 Nov 2015 • Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas
In recent years there have been many successes of using deep representations in reinforcement learning.
Ranked #1 on
Atari Games
on Atari 2600 Pong