1 code implementation • 13 Feb 2024 • Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Yunhao Tang, André Barreto, Will Dabney, Marc G. Bellemare, Mark Rowland
This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process.
Distributional Reinforcement Learning Model-based Reinforcement Learning +1
no code implementations • NeurIPS 2023 • David Abel, André Barreto, Benjamin Van Roy, Doina Precup, Hado van Hasselt, Satinder Singh
Using this new language, we define a continual learning agent as one that can be understood as carrying out an implicit search process indefinitely, and continual reinforcement learning as the setting in which the best agents are all continual learning agents.
no code implementations • 20 Jul 2023 • David Abel, André Barreto, Hado van Hasselt, Benjamin Van Roy, Doina Precup, Satinder Singh
Standard models of the reinforcement learning problem give rise to a straightforward definition of convergence: An agent converges when its behavior or performance in each environment state stops changing.
no code implementations • 17 Jun 2022 • Shantanu Thakoor, Mark Rowland, Diana Borsa, Will Dabney, Rémi Munos, André Barreto
We introduce a method for policy improvement that interpolates between the greedy approach of value-based reinforcement learning (RL) and the full planning approach typical of model-based RL.
no code implementations • 1 Jun 2022 • Tom Schaul, André Barreto, John Quan, Georg Ostrovski
We identify and study the phenomenon of policy churn, that is, the rapid change of the greedy policy in value-based reinforcement learning.
no code implementations • 8 Dec 2021 • Angelos Filos, Eszter Vértes, Zita Marinho, Gregory Farquhar, Diana Borsa, Abram Friesen, Feryal Behbahani, Tom Schaul, André Barreto, Simon Osindero
Unlike prior work which estimates uncertainty by training an ensemble of many models and/or value functions, this approach requires only the single model and value function which are already being learned in most model-based reinforcement learning algorithms.
Model-based Reinforcement Learning Rolling Shutter Correction
no code implementations • NeurIPS 2019 • André Barreto, Diana Borsa, Shaobo Hou, Gheorghe Comanici, Eser Aygün, Philippe Hamel, Daniel Toyama, Jonathan Hunt, Shibl Mourad, David Silver, Doina Precup
Building on this insight and on previous results on transfer learning, we show how to approximate options whose cumulants are linear combinations of the cumulants of known options.
1 code implementation • NeurIPS 2021 • Christopher Grimm, André Barreto, Gregory Farquhar, David Silver, Satinder Singh
The value-equivalence (VE) principle proposes a simple answer to this question: a model should capture the aspects of the environment that are relevant for value-based planning.
Model-based Reinforcement Learning Reinforcement Learning (RL)
no code implementations • NeurIPS 2021 • Michael Gimelfarb, André Barreto, Scott Sanner, Chi-Guhn Lee
Sample efficiency and risk-awareness are central to the development of practical reinforcement learning (RL) for complex decision-making.
no code implementations • NeurIPS 2020 • Christopher Grimm, André Barreto, Satinder Singh, David Silver
As our main contribution, we introduce the principle of value equivalence: two models are value equivalent with respect to a set of functions and policies if they yield the same Bellman updates.
Model-based Reinforcement Learning reinforcement-learning +2
1 code implementation • Proceedings of the National Academy of Sciences 2020 • André Barreto, Shaobo Hou, Diana Borsa, David Silver, and Doina Precup.
Both strategies considerably reduce the amount of data needed to solve a reinforcement-learning problem.
no code implementations • 3 Jul 2020 • Hado van Hasselt, Sephora Madjiheurem, Matteo Hessel, David Silver, André Barreto, Diana Borsa
The question of how to determine which states and actions are responsible for a certain outcome is known as the credit assignment problem and remains a central research question in reinforcement learning and artificial intelligence.
no code implementations • 3 Jun 2020 • Will Dabney, André Barreto, Mark Rowland, Robert Dadashi, John Quan, Marc G. Bellemare, David Silver
To test our hypothesis empirically, we augmented a standard deep RL agent with an auxiliary task of learning the value-improvement path.
no code implementations • ICLR 2021 • Will Dabney, Georg Ostrovski, André Barreto
Recent work on exploration in reinforcement learning (RL) has led to a series of increasingly complex solutions to the problem.
no code implementations • ICML 2018 • André Barreto, Diana Borsa, John Quan, Tom Schaul, David Silver, Matteo Hessel, Daniel Mankowitz, Augustin Žídek, Rémi Munos
In this paper we extend the SFs & GPI framework in two ways.
1 code implementation • ICLR 2019 • Diana Borsa, André Barreto, John Quan, Daniel Mankowitz, Rémi Munos, Hado van Hasselt, David Silver, Tom Schaul
We focus on one aspect in particular, namely the ability to generalise to unseen tasks.
2 code implementations • NeurIPS 2018 • Steven Hansen, Pablo Sprechmann, Alexander Pritzel, André Barreto, Charles Blundell
We propose Ephemeral Value Adjusments (EVA): a means of allowing deep reinforcement learning agents to rapidly adapt to experience in their replay buffer.
no code implementations • 22 Feb 2018 • Daniel J. Mankowitz, Augustin Žídek, André Barreto, Dan Horgan, Matteo Hessel, John Quan, Junhyuk Oh, Hado van Hasselt, David Silver, Tom Schaul
Some real-world domains are best characterized as a single task, but for others this perspective is limiting.
no code implementations • NeurIPS 2017 • André Barreto, Will Dabney, Rémi Munos, Jonathan J. Hunt, Tom Schaul, Hado van Hasselt, David Silver
Transfer in reinforcement learning refers to the notion that generalization should occur not only within a task but also across tasks.