Search Results for author: Hado van Hasselt

Found 51 papers, 17 papers with code

On the Convergence of Bounded Agents

no code implementations20 Jul 2023 David Abel, André Barreto, Hado van Hasselt, Benjamin Van Roy, Doina Precup, Satinder Singh

Standard models of the reinforcement learning problem give rise to a straightforward definition of convergence: An agent converges when its behavior or performance in each environment state stops changing.


Exploration via Epistemic Value Estimation

no code implementations7 Mar 2023 Simon Schmitt, John Shawe-Taylor, Hado van Hasselt

We propose epistemic value estimation (EVE): a recipe that is compatible with sequential decision making and with neural network function approximators.

Decision Making Efficient Exploration +1

Learning How to Infer Partial MDPs for In-Context Adaptation and Exploration

no code implementations8 Feb 2023 Chentian Jiang, Nan Rosemary Ke, Hado van Hasselt

To generalize across tasks, an agent should acquire knowledge from past tasks that facilitate adaptation and exploration in future tasks.

Bayesian Inference Thompson Sampling

Optimistic Meta-Gradients

no code implementations9 Jan 2023 Sebastian Flennerhag, Tom Zahavy, Brendan O'Donoghue, Hado van Hasselt, András György, Satinder Singh

We study the connection between gradient-based meta-learning and convex op-timisation.


Human-level Atari 200x faster

no code implementations15 Sep 2022 Steven Kapturowski, Víctor Campos, Ray Jiang, Nemanja Rakićević, Hado van Hasselt, Charles Blundell, Adrià Puigdomènech Badia

The task of building general agents that perform well over a wide range of tasks has been an importantgoal in reinforcement learning since its inception.

Selective Credit Assignment

no code implementations20 Feb 2022 Veronica Chelu, Diana Borsa, Doina Precup, Hado van Hasselt

Efficient credit assignment is essential for reinforcement learning algorithms in both prediction and control settings.

reinforcement-learning Reinforcement Learning (RL)

Chaining Value Functions for Off-Policy Learning

no code implementations17 Jan 2022 Simon Schmitt, John Shawe-Taylor, Hado van Hasselt

To accumulate knowledge and improve its policy of behaviour, a reinforcement learning agent can learn `off-policy' about policies that differ from the policy used to generate its experience.

reinforcement-learning Reinforcement Learning (RL)

Self-Consistent Models and Values

no code implementations NeurIPS 2021 Gregory Farquhar, Kate Baumli, Zita Marinho, Angelos Filos, Matteo Hessel, Hado van Hasselt, David Silver

Learned models of the environment provide reinforcement learning (RL) agents with flexible ways of making predictions about the environment.

reinforcement-learning Reinforcement Learning (RL)

Pick Your Battles: Interaction Graphs as Population-Level Objectives for Strategic Diversity

no code implementations8 Oct 2021 Marta Garnelo, Wojciech Marian Czarnecki, SiQi Liu, Dhruva Tirumala, Junhyuk Oh, Gauthier Gidel, Hado van Hasselt, David Balduzzi

Strategic diversity is often essential in games: in multi-player games, for example, evaluating a player against a diverse set of strategies will yield a more accurate estimate of its performance.

Learning by Directional Gradient Descent

no code implementations ICLR 2022 David Silver, Anirudh Goyal, Ivo Danihelka, Matteo Hessel, Hado van Hasselt

How should state be constructed from a sequence of observations, so as to best achieve some objective?

Introducing Symmetries to Black Box Meta Reinforcement Learning

no code implementations22 Sep 2021 Louis Kirsch, Sebastian Flennerhag, Hado van Hasselt, Abram Friesen, Junhyuk Oh, Yutian Chen

We show that a recent successful meta RL approach that meta-learns an objective for backpropagation-based learning exhibits certain symmetries (specifically the reuse of the learning rule, and invariance to input and output permutations) that are not present in typical black-box meta RL systems.

Meta-Learning Meta Reinforcement Learning +2

Bootstrapped Meta-Learning

1 code implementation ICLR 2022 Sebastian Flennerhag, Yannick Schroecker, Tom Zahavy, Hado van Hasselt, David Silver, Satinder Singh

We achieve a new state-of-the art for model-free agents on the Atari ALE benchmark and demonstrate that it yields both performance and efficiency gains in multi-task meta-learning.

Efficient Exploration Few-Shot Learning +1

Learning Expected Emphatic Traces for Deep RL

no code implementations12 Jul 2021 Ray Jiang, Shangtong Zhang, Veronica Chelu, Adam White, Hado van Hasselt

We develop a multi-step emphatic weighting that can be combined with replay, and a time-reversed $n$-step TD learning algorithm to learn the required emphatic weighting.

Podracer architectures for scalable Reinforcement Learning

3 code implementations13 Apr 2021 Matteo Hessel, Manuel Kroiss, Aidan Clark, Iurii Kemaev, John Quan, Thomas Keck, Fabio Viola, Hado van Hasselt

Supporting state-of-the-art AI research requires balancing rapid prototyping, ease of use, and quick iteration, with the ability to deploy experiments at a scale traditionally associated with production systems. Deep learning frameworks such as TensorFlow, PyTorch and JAX allow users to transparently make use of accelerators, such as TPUs and GPUs, to offload the more computationally intensive parts of training and inference in modern deep learning systems.

reinforcement-learning Reinforcement Learning (RL)

Synthetic Returns for Long-Term Credit Assignment

2 code implementations24 Feb 2021 David Raposo, Sam Ritter, Adam Santoro, Greg Wayne, Theophane Weber, Matt Botvinick, Hado van Hasselt, Francis Song

We propose state-associative (SA) learning, where the agent learns associations between states and arbitrarily distant future rewards, then propagates credit directly between the two.

Forethought and Hindsight in Credit Assignment

no code implementations NeurIPS 2020 Veronica Chelu, Doina Precup, Hado van Hasselt

We address the problem of credit assignment in reinforcement learning and explore fundamental questions regarding the way in which an agent can best use additional computation to propagate new information, by planning with internal models of the world to improve its predictions.

Reinforcement Learning (RL)

Discovering Reinforcement Learning Algorithms

1 code implementation NeurIPS 2020 Junhyuk Oh, Matteo Hessel, Wojciech M. Czarnecki, Zhongwen Xu, Hado van Hasselt, Satinder Singh, David Silver

Automating the discovery of update rules from data could lead to more efficient algorithms, or algorithms that are better adapted to specific environments.

Atari Games Meta-Learning +3

Meta-Gradient Reinforcement Learning with an Objective Discovered Online

no code implementations NeurIPS 2020 Zhongwen Xu, Hado van Hasselt, Matteo Hessel, Junhyuk Oh, Satinder Singh, David Silver

In this work, we propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network, solely from interactive experience with its environment.

Q-Learning reinforcement-learning +1

Expected Eligibility Traces

no code implementations3 Jul 2020 Hado van Hasselt, Sephora Madjiheurem, Matteo Hessel, David Silver, André Barreto, Diana Borsa

The question of how to determine which states and actions are responsible for a certain outcome is known as the credit assignment problem and remains a central research question in reinforcement learning and artificial intelligence.

A Self-Tuning Actor-Critic Algorithm

no code implementations NeurIPS 2020 Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh

Reinforcement learning algorithms are highly sensitive to the choice of hyperparameters, typically requiring significant manual effort to identify hyperparameters that perform well on a new domain.

Atari Games reinforcement-learning +1

What Can Learned Intrinsic Rewards Capture?

no code implementations ICML 2020 Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss, Hado van Hasselt, David Silver, Satinder Singh

Furthermore, we show that unlike policy transfer methods that capture "how" the agent should behave, the learned reward functions can generalise to other kinds of agents and to changes in the dynamics of the environment by capturing "what" the agent should strive to do.

Discovery of Useful Questions as Auxiliary Tasks

no code implementations NeurIPS 2019 Vivek Veeriah, Matteo Hessel, Zhongwen Xu, Richard Lewis, Janarthanan Rajendran, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh

Arguably, intelligent agents ought to be able to discover their own questions so that in learning answers for them they learn unanticipated useful knowledge and skills; this departs from the focus in much of machine learning on agents learning answers to externally defined questions.

Reinforcement Learning (RL)

Towards Consistent Performance on Atari using Expert Demonstrations

no code implementations ICLR 2019 Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Večerík, Matteo Hessel, Rémi Munos, Olivier Pietquin

Despite significant advances in the field of deep Reinforcement Learning (RL), today's algorithms still fail to learn human-level policies consistently over a set of diverse tasks such as Atari 2600 games.

Atari Games Reinforcement Learning (RL)

Deep Reinforcement Learning and the Deadly Triad

no code implementations6 Dec 2018 Hado van Hasselt, Yotam Doron, Florian Strub, Matteo Hessel, Nicolas Sonnerat, Joseph Modayil

In this work, we investigate the impact of the deadly triad in practice, in the context of a family of popular deep reinforcement learning models - deep Q-networks trained with experience replay - analysing how the components of this system play a role in the emergence of the deadly triad, and in the agent's performance

Learning Theory reinforcement-learning +1

The Barbados 2018 List of Open Issues in Continual Learning

no code implementations16 Nov 2018 Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc Bellemare, Doina Precup

We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments.

Continual Learning

Multi-task Deep Reinforcement Learning with PopArt

2 code implementations12 Sep 2018 Matteo Hessel, Hubert Soyer, Lasse Espeholt, Wojciech Czarnecki, Simon Schmitt, Hado van Hasselt

This means the learning algorithm is general, but each solution is not; each agent can only solve the one task it was trained on.

Atari Games Multi-Task Learning +2

Observe and Look Further: Achieving Consistent Performance on Atari

no code implementations29 May 2018 Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Večerík, Matteo Hessel, Rémi Munos, Olivier Pietquin

Despite significant advances in the field of deep Reinforcement Learning (RL), today's algorithms still fail to learn human-level policies consistently over a set of diverse tasks such as Atari 2600 games.

Montezuma's Revenge Reinforcement Learning (RL)

Meta-Gradient Reinforcement Learning

no code implementations NeurIPS 2018 Zhongwen Xu, Hado van Hasselt, David Silver

Instead, the majority of reinforcement learning algorithms estimate and/or optimise a proxy for the value function.

Meta-Learning reinforcement-learning +1

Distributed Prioritized Experience Replay

15 code implementations ICLR 2018 Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado van Hasselt, David Silver

We propose a distributed architecture for deep reinforcement learning at scale, that enables agents to learn effectively from orders of magnitude more data than previously possible.

Atari Games reinforcement-learning +1

Deep Reinforcement Learning in Large Discrete Action Spaces

2 code implementations24 Dec 2015 Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, Theophane Weber, Thomas Degris, Ben Coppin

Being able to reason in an environment with a large number of discrete actions is essential to bringing reinforcement learning to a larger class of problems.

Recommendation Systems reinforcement-learning +1

Deep Reinforcement Learning with Double Q-learning

96 code implementations22 Sep 2015 Hado van Hasselt, Arthur Guez, David Silver

The popular Q-learning algorithm is known to overestimate action values under certain conditions.

Atari Games Q-Learning +1

Learning to Predict Independent of Span

no code implementations19 Aug 2015 Hado van Hasselt, Richard S. Sutton

If predictions are made at a high rate or span over a large amount of time, substantial computation can be required to store all relevant observations and to update all predictions when the outcome is finally observed.

Cannot find the paper you are looking for? You can Submit a new open access paper.