no code implementations • 22 Jan 2025 • Bernardo Ávila Pires, Mark Rowland, Diana Borsa, Zhaohan Daniel Guo, Khimya Khetarpal, André Barreto, David Abel, Rémi Munos, Will Dabney
To go beyond expected utilities, we combine distributional DP with stock augmentation, a technique previously introduced for classic DP in the context of risk-sensitive RL, where the MDP state is augmented with a statistic of the rewards obtained so far (since the first time step).
no code implementations • 4 Jun 2024 • Khimya Khetarpal, Zhaohan Daniel Guo, Bernardo Avila Pires, Yunhao Tang, Clare Lyle, Mark Rowland, Nicolas Heess, Diana Borsa, Arthur Guez, Will Dabney
In this work, we take a step towards bridging the gap between theory and practice by analyzing an action-conditional self-predictive objective (BYOL-AC) using the ODE framework, characterizing its convergence properties and highlighting important distinctions between the limiting solutions of the BYOL-$\Pi$ and BYOL-AC dynamics.
no code implementations • 17 Jun 2022 • Shantanu Thakoor, Mark Rowland, Diana Borsa, Will Dabney, Rémi Munos, André Barreto
We introduce a method for policy improvement that interpolates between the greedy approach of value-based reinforcement learning (RL) and the full planning approach typical of model-based RL.
no code implementations • 20 Feb 2022 • Veronica Chelu, Diana Borsa, Doina Precup, Hado van Hasselt
Efficient credit assignment is essential for reinforcement learning algorithms in both prediction and control settings.
no code implementations • 8 Dec 2021 • Angelos Filos, Eszter Vértes, Zita Marinho, Gregory Farquhar, Diana Borsa, Abram Friesen, Feryal Behbahani, Tom Schaul, André Barreto, Simon Osindero
Unlike prior work which estimates uncertainty by training an ensemble of many models and/or value functions, this approach requires only the single model and value function which are already being learned in most model-based reinforcement learning algorithms.
Model-based Reinforcement Learning
Rolling Shutter Correction
no code implementations • NeurIPS 2021 • Miruna Pîslar, David Szepesvari, Georg Ostrovski, Diana Borsa, Tom Schaul
Exploration remains a central challenge for reinforcement learning (RL).
no code implementations • NeurIPS 2019 • André Barreto, Diana Borsa, Shaobo Hou, Gheorghe Comanici, Eser Aygün, Philippe Hamel, Daniel Toyama, Jonathan Hunt, Shibl Mourad, David Silver, Doina Precup
Building on this insight and on previous results on transfer learning, we show how to approximate options whose cumulants are linear combinations of the cumulants of known options.
no code implementations • 11 May 2021 • Tom Schaul, Georg Ostrovski, Iurii Kemaev, Diana Borsa
Scaling issues are mundane yet irritating for practitioners of reinforcement learning.
1 code implementation • Proceedings of the National Academy of Sciences 2020 • André Barreto, Shaobo Hou, Diana Borsa, David Silver, and Doina Precup.
Both strategies considerably reduce the amount of data needed to solve a reinforcement-learning problem.
no code implementations • 3 Jul 2020 • Hado van Hasselt, Sephora Madjiheurem, Matteo Hessel, David Silver, André Barreto, Diana Borsa
The question of how to determine which states and actions are responsible for a certain outcome is known as the credit assignment problem and remains a central research question in reinforcement learning and artificial intelligence.
no code implementations • 14 Dec 2019 • Tom Schaul, Diana Borsa, David Ding, David Szepesvari, Georg Ostrovski, Will Dabney, Simon Osindero
Determining what experience to generate to best facilitate learning (i. e. exploration) is one of the distinguishing features and open challenges in reinforcement learning.
no code implementations • 16 Oct 2019 • Mark Rowland, Anna Harutyunyan, Hado van Hasselt, Diana Borsa, Tom Schaul, Rémi Munos, Will Dabney
We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.
no code implementations • 8 Jul 2019 • Hado van Hasselt, John Quan, Matteo Hessel, Zhongwen Xu, Diana Borsa, Andre Barreto
We consider a general class of non-linear Bellman equations.
no code implementations • 25 Apr 2019 • Tom Schaul, Diana Borsa, Joseph Modayil, Razvan Pascanu
Rather than proposing a new method, this paper investigates an issue present in existing learning algorithms.
no code implementations • 26 Feb 2019 • Anna Harutyunyan, Will Dabney, Diana Borsa, Nicolas Heess, Remi Munos, Doina Precup
In this work, we consider the problem of autonomously discovering behavioral abstractions, or options, for reinforcement learning agents.
no code implementations • ICML 2018 • André Barreto, Diana Borsa, John Quan, Tom Schaul, David Silver, Matteo Hessel, Daniel Mankowitz, Augustin Žídek, Rémi Munos
In this paper we extend the SFs & GPI framework in two ways.
2 code implementations • ICLR 2019 • Diana Borsa, André Barreto, John Quan, Daniel Mankowitz, Rémi Munos, Hado van Hasselt, David Silver, Tom Schaul
We focus on one aspect in particular, namely the ability to generalise to unseen tasks.
no code implementations • 20 Jun 2017 • Diana Borsa, Bilal Piot, Rémi Munos, Olivier Pietquin
Observational learning is a type of learning that occurs as a function of observing, retaining and possibly replicating or imitating the behaviour of another agent.
no code implementations • 7 Mar 2016 • Diana Borsa, Thore Graepel, John Shawe-Taylor
We investigate a paradigm in multi-task reinforcement learning (MT-RL) in which an agent is placed in an environment and needs to learn to perform a series of tasks, within this space.
no code implementations • 9 Jun 2015 • Diana Borsa, Thore Graepel, Andrew Gordon
We consider the problem of modelling noisy but highly symmetric shapes that can be viewed as hierarchies of whole-part relationships in which higher level objects are composed of transformed collections of lower level objects.