Search Results for author: Andre Barreto

Found 15 papers, 1 papers with code

Video as the New Language for Real-World Decision Making

no code implementations27 Feb 2024 Sherry Yang, Jacob Walker, Jack Parker-Holder, Yilun Du, Jake Bruce, Andre Barreto, Pieter Abbeel, Dale Schuurmans

Moreover, we demonstrate how, like language models, video generation can serve as planners, agents, compute engines, and environment simulators through techniques such as in-context learning, planning and reinforcement learning.

Decision Making In-Context Learning +2

Temporal Abstraction in Reinforcement Learning with the Successor Representation

no code implementations12 Oct 2021 Marlos C. Machado, Andre Barreto, Doina Precup, Michael Bowling

In this paper, we argue that the successor representation (SR), which encodes states based on the pattern of state visitation that follows them, can be seen as a natural substrate for the discovery and use of temporal abstractions.

reinforcement-learning Reinforcement Learning (RL)

Discovering Diverse Nearly Optimal Policies with Successor Features

no code implementations ICML Workshop URL 2021 Tom Zahavy, Brendan O'Donoghue, Andre Barreto, Volodymyr Mnih, Sebastian Flennerhag, Satinder Singh

We propose Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features, while assuring that they are near optimal.

Beyond Fine-Tuning: Transferring Behavior in Reinforcement Learning

no code implementations24 Feb 2021 Víctor Campos, Pablo Sprechmann, Steven Hansen, Andre Barreto, Steven Kapturowski, Alex Vitvitskyi, Adrià Puigdomènech Badia, Charles Blundell

We introduce Behavior Transfer (BT), a technique that leverages pre-trained policies for exploration and that is complementary to transferring neural network weights.

reinforcement-learning Reinforcement Learning (RL) +1

Discovering a set of policies for the worst case reward

no code implementations ICLR 2021 Tom Zahavy, Andre Barreto, Daniel J Mankowitz, Shaobo Hou, Brendan O'Donoghue, Iurii Kemaev, Satinder Singh

Our main contribution is a policy iteration algorithm that builds a set of policies in order to maximize the worst-case performance of the resulting SMP on the set of tasks.

On Efficiency in Hierarchical Reinforcement Learning

no code implementations NeurIPS 2020 Zheng Wen, Doina Precup, Morteza Ibrahimi, Andre Barreto, Benjamin Van Roy, Satinder Singh

Hierarchical Reinforcement Learning (HRL) approaches promise to provide more efficient solutions to sequential decision making problems, both in terms of statistical as well as computational efficiency.

Computational Efficiency Decision Making +4

Temporal Difference Uncertainties as a Signal for Exploration

no code implementations5 Oct 2020 Sebastian Flennerhag, Jane X. Wang, Pablo Sprechmann, Francesco Visin, Alexandre Galashov, Steven Kapturowski, Diana L. Borsa, Nicolas Heess, Andre Barreto, Razvan Pascanu

Instead, we incorporate it as an intrinsic reward and treat exploration as a separate learning problem, induced by the agent's temporal difference uncertainties.

Disentangled Cumulants Help Successor Representations Transfer to New Tasks

no code implementations25 Nov 2019 Christopher Grimm, Irina Higgins, Andre Barreto, Denis Teplyashin, Markus Wulfmeier, Tim Hertweck, Raia Hadsell, Satinder Singh

This is in contrast to the state-of-the-art reinforcement learning agents, which typically start learning each new task from scratch and struggle with knowledge transfer.

Transfer Learning

Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

no code implementations NeurIPS 2019 Hugo Penedones, Carlos Riquelme, Damien Vincent, Hartmut Maennel, Timothy Mann, Andre Barreto, Sylvain Gelly, Gergely Neu

We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation.

Fast Task Inference with Variational Intrinsic Successor Features

no code implementations ICLR 2020 Steven Hansen, Will Dabney, Andre Barreto, Tom Van de Wiele, David Warde-Farley, Volodymyr Mnih

It has been established that diverse behaviors spanning the controllable subspace of an Markov decision process can be trained by rewarding a policy for being distinguishable from other policies \citep{gregor2016variational, eysenbach2018diversity, warde2018unsupervised}.

Composing Entropic Policies using Divergence Correction

no code implementations5 Dec 2018 Jonathan J. Hunt, Andre Barreto, Timothy P. Lillicrap, Nicolas Heess

Composing previously mastered skills to solve novel tasks promises dramatic improvements in the data efficiency of reinforcement learning.

Continuous Control Reinforcement Learning (RL)

Temporal Difference Learning with Neural Networks - Study of the Leakage Propagation Problem

no code implementations9 Jul 2018 Hugo Penedones, Damien Vincent, Hartmut Maennel, Sylvain Gelly, Timothy Mann, Andre Barreto

Temporal-Difference learning (TD) [Sutton, 1988] with function approximation can converge to solutions that are worse than those obtained by Monte-Carlo regression, even in the simple case of on-policy evaluation.

Cannot find the paper you are looking for? You can Submit a new open access paper.