no code implementations • 6 Mar 2024 • Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Taïga, Yevgen Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, Aviral Kumar, Rishabh Agarwal
Observing this discrepancy, in this paper, we investigate whether the scalability of deep RL can also be improved simply by using classification in place of regression for training value functions.
no code implementations • 19 Feb 2024 • Johan Obando-Ceron, Aaron Courville, Pablo Samuel Castro
Recent work has shown that deep reinforcement learning agents have difficulty in effectively using their network parameters.
no code implementations • 13 Feb 2024 • Johan Obando-Ceron, Ghada Sokar, Timon Willi, Clare Lyle, Jesse Farebrother, Jakob Foerster, Gintare Karolina Dziugaite, Doina Precup, Pablo Samuel Castro
The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size.
1 code implementation • 23 Nov 2023 • Vincent Dumoulin, Daniel D. Johnson, Pablo Samuel Castro, Hugo Larochelle, Yann Dauphin
Learning from human feedback (LHF) -- and in particular learning from pairwise preferences -- has recently become a crucial ingredient in training large language models (LLMs), and has been the subject of much research.
1 code implementation • 21 Nov 2023 • Max Schwarzer, Jesse Farebrother, Joshua Greaves, Ekin Dogus Cubuk, Rishabh Agarwal, Aaron Courville, Marc G. Bellemare, Sergei Kalinin, Igor Mordatch, Pablo Samuel Castro, Kevin M. Roccapriore
We introduce a machine learning approach to determine the transition dynamics of silicon atoms on a single layer of carbon atoms, when stimulated by the electron beam of a scanning transmission electron microscope (STEM).
no code implementations • 5 Oct 2023 • Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland
Behavioural metrics have been shown to be an effective mechanism for constructing representations in reinforcement learning.
no code implementations • 25 Jul 2023 • Laixi Shi, Robert Dadashi, Yuejie Chi, Pablo Samuel Castro, Matthieu Geist
In this work, we propose to regularize towards the Q-function of the behavior policy instead of the behavior policy itself, under the premise that the Q-function can be estimated more reliably and easily by a SARSA-style estimate and handles the extrapolation error more straightforwardly.
3 code implementations • 30 May 2023 • Max Schwarzer, Johan Obando-Ceron, Aaron Courville, Marc Bellemare, Rishabh Agarwal, Pablo Samuel Castro
We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark.
Ranked #1 on Atari Games 100k on Atari 100k
1 code implementation • 27 Apr 2023 • Joo Hyung Lee, Wonpyo Park, Nicole Mitchell, Jonathan Pilault, Johan Obando-Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, Shivani Agrawal, Suvinay Subramanian, Xin Wang, Sheng-Chun Kao, Xingyao Zhang, Trevor Gale, Aart Bik, Woohyun Han, Milen Ferev, Zhonglin Han, Hong-Seok Kim, Yann Dauphin, Gintare Karolina Dziugaite, Pablo Samuel Castro, Utku Evci
This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research.
1 code implementation • 25 Apr 2023 • Jesse Farebrother, Joshua Greaves, Rishabh Agarwal, Charline Le Lan, Ross Goroshin, Pablo Samuel Castro, Marc G. Bellemare
Combined with a suitable off-policy learning rule, the result is a representation learning algorithm that can be understood as extending Mahadevan & Maggioni (2007)'s proto-value functions to deep reinforcement learning -- accordingly, we call the resulting object proto-value networks.
1 code implementation • 24 Feb 2023 • Ghada Sokar, Rishabh Agarwal, Pablo Samuel Castro, Utku Evci
In this work we identify the dormant neuron phenomenon in deep reinforcement learning, where an agent's network suffers from an increasing number of inactive neurons, thereby affecting network expressivity.
1 code implementation • 17 Jun 2022 • Laura Graesser, Utku Evci, Erich Elsen, Pablo Samuel Castro
The use of sparse neural networks has seen rapid growth in recent years, particularly in computer vision.
1 code implementation • 3 Jun 2022 • Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare
To address these issues, we present reincarnating RL as an alternative workflow or class of problem settings, where prior computational work (e. g., learned policies) is reused or transferred between design iterations of an RL agent, or from one RL agent to another.
no code implementations • 8 Nov 2021 • Pablo Samuel Castro
In this paper I present a study in using the losses and gradients obtained during the training of a simple function approximator as a mechanism for creating musical dissonance and visual distortion in a solo piano performance setting.
1 code implementation • NeurIPS 2021 • Georg Ostrovski, Pablo Samuel Castro, Will Dabney
Learning to act from observational data without active environmental interaction is a well-known challenge in Reinforcement Learning (RL).
no code implementations • NeurIPS Workshop LatinX_in_AI 2021 • João Guilherme Madeira Araújo, Johan Samir Obando Ceron, Pablo Samuel Castro
Successful applications of deep reinforcement learning (deep RL) combine algorithmic design and careful hyper-parameter selection.
no code implementations • 29 Sep 2021 • Halley Young, Vincent Dumoulin, Pablo Samuel Castro, Jesse Engel, Cheng-Zhi Anna Huang
To tackle the combinatorial nature of composing features, we propose a compositional approach to steering music transformers, building on lightweight fine-tuning methods such as prefix tuning and bias tuning.
3 code implementations • NeurIPS 2021 • Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare
Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs.
1 code implementation • 12 Aug 2021 • Sharan Vaswani, Olivier Bachem, Simone Totaro, Robert Mueller, Shivam Garg, Matthieu Geist, Marlos C. Machado, Pablo Samuel Castro, Nicolas Le Roux
Common policy gradient methods rely on the maximization of a sequence of surrogate functions.
2 code implementations • NeurIPS 2021 • Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland
We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effective means of shaping the learnt representations of deep reinforcement learning agents.
no code implementations • ICLR Workshop SSL-RL 2021 • Manfred Diaz, Liam Paull, Pablo Samuel Castro
We offer a novel approach to balance exploration and exploitation in reinforcement learning (RL).
2 code implementations • 2 Feb 2021 • Charline Le Lan, Marc G. Bellemare, Pablo Samuel Castro
In most practical applications of reinforcement learning, it is untenable to maintain direct estimates for individual states; in continuous-state systems, it is impossible.
1 code implementation • ICLR 2021 • Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G. Bellemare
Specifically, we introduce a theoretically motivated policy similarity metric (PSM) for measuring behavioral similarity between states.
2 code implementations • 20 Nov 2020 • Johan S. Obando-Ceron, Pablo Samuel Castro
Since the introduction of DQN, a vast majority of reinforcement learning research has focused on reinforcement learning with deep neural networks as function approximators.
1 code implementation • 6 Nov 2020 • Pablo Samuel Castro
Since the introduction of Generative Adversarial Networks (GANs) [Goodfellow et al., 2014] there has been a regular stream of both technical advances (e. g., Arjovsky et al. [2017]) and creative uses of these generative models (e. g., [Karras et al., 2019, Zhu et al., 2017, Jin et al., 2017]).
10 code implementations • ICML 2020 • Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, Erich Elsen
There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model.
Ranked #1 on Sparse Learning on ImageNet
1 code implementation • 21 Nov 2019 • Pablo Samuel Castro
We present new algorithms for computing and approximating bisimulation metrics in Markov Decision Processes (MDPs).
no code implementations • 31 Jul 2019 • Pablo Samuel Castro, Shijian Li, Daqing Zhang
We consider the problem of learning to behave optimally in a Markov Decision Process when a reward function is not specified, but instead we have access to a set of demonstrators of varying performance.
1 code implementation • 30 Apr 2019 • Pablo Samuel Castro
The quality of outputs produced by deep generative models for music have seen a dramatic improvement in the last few years.
no code implementations • 8 Feb 2019 • Marc G. Bellemare, Nicolas Le Roux, Pablo Samuel Castro, Subhodeep Moitra
Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited.
Distributional Reinforcement Learning reinforcement-learning +1
no code implementations • 31 Jan 2019 • Kory W. Mathewson, Pablo Samuel Castro, Colin Cherry, George Foster, Marc G. Bellemare
We consider the problem of designing an artificial agent capable of interacting with humans in collaborative dialogue to produce creative, engaging narratives.
no code implementations • NeurIPS 2019 • Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle
We leverage this perspective to provide formal evidence regarding the usefulness of value functions as auxiliary tasks.
no code implementations • 30 Jan 2019 • Clare Lyle, Pablo Samuel Castro, Marc G. Bellemare
Since their introduction a year ago, distributional approaches to reinforcement learning (distributional RL) have produced strong results relative to the standard approach which models expected values (expected RL).
Distributional Reinforcement Learning reinforcement-learning +1
1 code implementation • 17 Dec 2018 • Felipe Petroski Such, Vashisht Madhavan, Rosanne Liu, Rui Wang, Pablo Samuel Castro, Yulun Li, Jiale Zhi, Ludwig Schubert, Marc G. Bellemare, Jeff Clune, Joel Lehman
We lessen this friction, by (1) training several algorithms at scale and releasing trained models, (2) integrating with a previous Deep RL model release, and (3) releasing code that makes it easy for anyone to load, visualize, and analyze such models.
12 code implementations • 14 Dec 2018 • Pablo Samuel Castro, Subhodeep Moitra, Carles Gelada, Saurabh Kumar, Marc G. Bellemare
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.
no code implementations • 12 Nov 2018 • Pablo Samuel Castro, Maria Attarian
The use of language models for generating lyrics and poetry has received an increased interest in the last few years.