no code implementations • 27 Feb 2024 • Sherry Yang, Jacob Walker, Jack Parker-Holder, Yilun Du, Jake Bruce, Andre Barreto, Pieter Abbeel, Dale Schuurmans
Moreover, we demonstrate how, like language models, video generation can serve as planners, agents, compute engines, and environment simulators through techniques such as in-context learning, planning and reinforcement learning.
no code implementations • 12 Oct 2021 • Marlos C. Machado, Andre Barreto, Doina Precup, Michael Bowling
In this paper, we argue that the successor representation (SR), which encodes states based on the pattern of state visitation that follows them, can be seen as a natural substrate for the discovery and use of temporal abstractions.
no code implementations • ICML Workshop URL 2021 • Tom Zahavy, Brendan O'Donoghue, Andre Barreto, Volodymyr Mnih, Sebastian Flennerhag, Satinder Singh
We propose Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features, while assuring that they are near optimal.
no code implementations • 24 Feb 2021 • Víctor Campos, Pablo Sprechmann, Steven Hansen, Andre Barreto, Steven Kapturowski, Alex Vitvitskyi, Adrià Puigdomènech Badia, Charles Blundell
We introduce Behavior Transfer (BT), a technique that leverages pre-trained policies for exploration and that is complementary to transferring neural network weights.
no code implementations • ICLR 2021 • Tom Zahavy, Andre Barreto, Daniel J Mankowitz, Shaobo Hou, Brendan O'Donoghue, Iurii Kemaev, Satinder Singh
Our main contribution is a policy iteration algorithm that builds a set of policies in order to maximize the worst-case performance of the resulting SMP on the set of tasks.
no code implementations • NeurIPS 2020 • Zheng Wen, Doina Precup, Morteza Ibrahimi, Andre Barreto, Benjamin Van Roy, Satinder Singh
Hierarchical Reinforcement Learning (HRL) approaches promise to provide more efficient solutions to sequential decision making problems, both in terms of statistical as well as computational efficiency.
no code implementations • 5 Oct 2020 • Sebastian Flennerhag, Jane X. Wang, Pablo Sprechmann, Francesco Visin, Alexandre Galashov, Steven Kapturowski, Diana L. Borsa, Nicolas Heess, Andre Barreto, Razvan Pascanu
Instead, we incorporate it as an intrinsic reward and treat exploration as a separate learning problem, induced by the agent's temporal difference uncertainties.
no code implementations • 25 Nov 2019 • Christopher Grimm, Irina Higgins, Andre Barreto, Denis Teplyashin, Markus Wulfmeier, Tim Hertweck, Raia Hadsell, Satinder Singh
This is in contrast to the state-of-the-art reinforcement learning agents, which typically start learning each new task from scratch and struggle with knowledge transfer.
no code implementations • 8 Jul 2019 • Hado van Hasselt, John Quan, Matteo Hessel, Zhongwen Xu, Diana Borsa, Andre Barreto
We consider a general class of non-linear Bellman equations.
no code implementations • NeurIPS 2019 • Hugo Penedones, Carlos Riquelme, Damien Vincent, Hartmut Maennel, Timothy Mann, Andre Barreto, Sylvain Gelly, Gergely Neu
We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation.
no code implementations • ICLR 2020 • Steven Hansen, Will Dabney, Andre Barreto, Tom Van de Wiele, David Warde-Farley, Volodymyr Mnih
It has been established that diverse behaviors spanning the controllable subspace of an Markov decision process can be trained by rewarding a policy for being distinguishable from other policies \citep{gregor2016variational, eysenbach2018diversity, warde2018unsupervised}.
no code implementations • 5 Dec 2018 • Jonathan J. Hunt, Andre Barreto, Timothy P. Lillicrap, Nicolas Heess
Composing previously mastered skills to solve novel tasks promises dramatic improvements in the data efficiency of reinforcement learning.
no code implementations • 9 Jul 2018 • Hugo Penedones, Damien Vincent, Hartmut Maennel, Sylvain Gelly, Timothy Mann, Andre Barreto
Temporal-Difference learning (TD) [Sutton, 1988] with function approximation can converge to solutions that are worse than those obtained by Monte-Carlo regression, even in the simple case of on-policy evaluation.
no code implementations • NeurIPS 2017 • Zhongwen Xu, Joseph Modayil, Hado P. Van Hasselt, Andre Barreto, David Silver, Tom Schaul
Neural networks have a smooth initial inductive bias, such that small changes in input do not lead to large changes in output.
1 code implementation • ICML 2017 • David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris
One of the key challenges of artificial intelligence is to learn models that are effective in the context of planning.