Search Results for author: Pablo Samuel Castro

Found 38 papers, 23 papers with code

Mixture of Experts in a Mixture of RL settings

no code implementations26 Jun 2024 Timon Willi, Johan Obando-Ceron, Jakob Foerster, Karolina Dziugaite, Pablo Samuel Castro

Mixtures of Experts (MoEs) have gained prominence in (self-)supervised learning due to their enhanced inference efficiency, adaptability to distributed training, and modularity.

Self-Supervised Learning

On the consistency of hyper-parameter selection in value-based deep reinforcement learning

1 code implementation25 Jun 2024 Johan Obando-Ceron, João G. M. Araújo, Aaron Courville, Pablo Samuel Castro

This paper conducts an extensive empirical study focusing on the reliability of hyper-parameter selection for value-based deep reinforcement learning agents, including the introduction of a new score to quantify the consistency and reliability of various hyper-parameters.

reinforcement-learning

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

no code implementations6 Mar 2024 Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Taïga, Yevgen Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, Aviral Kumar, Rishabh Agarwal

Observing this discrepancy, in this paper, we investigate whether the scalability of deep RL can also be improved simply by using classification in place of regression for training value functions.

Atari Games regression +1

In value-based deep reinforcement learning, a pruned network is a good network

no code implementations19 Feb 2024 Johan Obando-Ceron, Aaron Courville, Pablo Samuel Castro

Recent work has shown that deep reinforcement learning agents have difficulty in effectively using their network parameters.

reinforcement-learning

Mixtures of Experts Unlock Parameter Scaling for Deep RL

1 code implementation13 Feb 2024 Johan Obando-Ceron, Ghada Sokar, Timon Willi, Clare Lyle, Jesse Farebrother, Jakob Foerster, Gintare Karolina Dziugaite, Doina Precup, Pablo Samuel Castro

The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size.

reinforcement-learning Self-Supervised Learning

A density estimation perspective on learning from pairwise human preferences

1 code implementation23 Nov 2023 Vincent Dumoulin, Daniel D. Johnson, Pablo Samuel Castro, Hugo Larochelle, Yann Dauphin

Learning from human feedback (LHF) -- and in particular learning from pairwise preferences -- has recently become a crucial ingredient in training large language models (LLMs), and has been the subject of much research.

Density Estimation

Learning and Controlling Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy

1 code implementation21 Nov 2023 Max Schwarzer, Jesse Farebrother, Joshua Greaves, Ekin Dogus Cubuk, Rishabh Agarwal, Aaron Courville, Marc G. Bellemare, Sergei Kalinin, Igor Mordatch, Pablo Samuel Castro, Kevin M. Roccapriore

We introduce a machine learning approach to determine the transition dynamics of silicon atoms on a single layer of carbon atoms, when stimulated by the electron beam of a scanning transmission electron microscope (STEM).

A Kernel Perspective on Behavioural Metrics for Markov Decision Processes

no code implementations5 Oct 2023 Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland

Behavioural metrics have been shown to be an effective mechanism for constructing representations in reinforcement learning.

reinforcement-learning

Offline Reinforcement Learning with On-Policy Q-Function Regularization

no code implementations25 Jul 2023 Laixi Shi, Robert Dadashi, Yuejie Chi, Pablo Samuel Castro, Matthieu Geist

In this work, we propose to regularize towards the Q-function of the behavior policy instead of the behavior policy itself, under the premise that the Q-function can be estimated more reliably and easily by a SARSA-style estimate and handles the extrapolation error more straightforwardly.

D4RL reinforcement-learning +1

Bigger, Better, Faster: Human-level Atari with human-level efficiency

3 code implementations30 May 2023 Max Schwarzer, Johan Obando-Ceron, Aaron Courville, Marc Bellemare, Rishabh Agarwal, Pablo Samuel Castro

We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark.

Atari Games 100k

Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks

1 code implementation25 Apr 2023 Jesse Farebrother, Joshua Greaves, Rishabh Agarwal, Charline Le Lan, Ross Goroshin, Pablo Samuel Castro, Marc G. Bellemare

Combined with a suitable off-policy learning rule, the result is a representation learning algorithm that can be understood as extending Mahadevan & Maggioni (2007)'s proto-value functions to deep reinforcement learning -- accordingly, we call the resulting object proto-value networks.

Atari Games reinforcement-learning +1

The Dormant Neuron Phenomenon in Deep Reinforcement Learning

2 code implementations24 Feb 2023 Ghada Sokar, Rishabh Agarwal, Pablo Samuel Castro, Utku Evci

In this work we identify the dormant neuron phenomenon in deep reinforcement learning, where an agent's network suffers from an increasing number of inactive neurons, thereby affecting network expressivity.

reinforcement-learning Reinforcement Learning (RL)

The State of Sparse Training in Deep Reinforcement Learning

1 code implementation17 Jun 2022 Laura Graesser, Utku Evci, Erich Elsen, Pablo Samuel Castro

The use of sparse neural networks has seen rapid growth in recent years, particularly in computer vision.

reinforcement-learning Reinforcement Learning (RL)

Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress

1 code implementation3 Jun 2022 Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare

To address these issues, we present reincarnating RL as an alternative workflow or class of problem settings, where prior computational work (e. g., learned policies) is reused or transferred between design iterations of an RL agent, or from one RL agent to another.

Atari Games Humanoid Control +2

Losses, Dissonances, and Distortions

no code implementations8 Nov 2021 Pablo Samuel Castro

In this paper I present a study in using the losses and gradients obtained during the training of a simple function approximator as a mechanism for creating musical dissonance and visual distortion in a solo piano performance setting.

The Difficulty of Passive Learning in Deep Reinforcement Learning

1 code implementation NeurIPS 2021 Georg Ostrovski, Pablo Samuel Castro, Will Dabney

Learning to act from observational data without active environmental interaction is a well-known challenge in Reinforcement Learning (RL).

reinforcement-learning Reinforcement Learning (RL)

Composing Features: Compositional Model Augmentation for Steerability of Music Transformers

no code implementations29 Sep 2021 Halley Young, Vincent Dumoulin, Pablo Samuel Castro, Jesse Engel, Cheng-Zhi Anna Huang

To tackle the combinatorial nature of composing features, we propose a compositional approach to steering music transformers, building on lightweight fine-tuning methods such as prefix tuning and bias tuning.

Deep Reinforcement Learning at the Edge of the Statistical Precipice

3 code implementations NeurIPS 2021 Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare

Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs.

reinforcement-learning Reinforcement Learning (RL)

MICo: Improved representations via sampling-based state similarity for Markov decision processes

2 code implementations NeurIPS 2021 Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland

We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effective means of shaping the learnt representations of deep reinforcement learning agents.

Atari Games reinforcement-learning +1

Metrics and continuity in reinforcement learning

2 code implementations2 Feb 2021 Charline Le Lan, Marc G. Bellemare, Pablo Samuel Castro

In most practical applications of reinforcement learning, it is untenable to maintain direct estimates for individual states; in continuous-state systems, it is impossible.

reinforcement-learning Reinforcement Learning (RL)

Revisiting Rainbow: Promoting more Insightful and Inclusive Deep Reinforcement Learning Research

2 code implementations20 Nov 2020 Johan S. Obando-Ceron, Pablo Samuel Castro

Since the introduction of DQN, a vast majority of reinforcement learning research has focused on reinforcement learning with deep neural networks as function approximators.

Atari Games reinforcement-learning +1

GANterpretations

1 code implementation6 Nov 2020 Pablo Samuel Castro

Since the introduction of Generative Adversarial Networks (GANs) [Goodfellow et al., 2014] there has been a regular stream of both technical advances (e. g., Arjovsky et al. [2017]) and creative uses of these generative models (e. g., [Karras et al., 2019, Zhu et al., 2017, Jin et al., 2017]).

Rigging the Lottery: Making All Tickets Winners

10 code implementations ICML 2020 Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, Erich Elsen

There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model.

Image Classification Language Modelling +1

Scalable methods for computing state similarity in deterministic Markov Decision Processes

1 code implementation21 Nov 2019 Pablo Samuel Castro

We present new algorithms for computing and approximating bisimulation metrics in Markov Decision Processes (MDPs).

Inverse Reinforcement Learning with Multiple Ranked Experts

no code implementations31 Jul 2019 Pablo Samuel Castro, Shijian Li, Daqing Zhang

We consider the problem of learning to behave optimally in a Markov Decision Process when a reward function is not specified, but instead we have access to a set of demonstrators of varying performance.

reinforcement-learning Reinforcement Learning (RL)

Performing Structured Improvisations with pre-trained Deep Learning Models

1 code implementation30 Apr 2019 Pablo Samuel Castro

The quality of outputs produced by deep generative models for music have seen a dramatic improvement in the last few years.

Distributional reinforcement learning with linear function approximation

no code implementations8 Feb 2019 Marc G. Bellemare, Nicolas Le Roux, Pablo Samuel Castro, Subhodeep Moitra

Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited.

Distributional Reinforcement Learning reinforcement-learning +1

Shaping the Narrative Arc: An Information-Theoretic Approach to Collaborative Dialogue

no code implementations31 Jan 2019 Kory W. Mathewson, Pablo Samuel Castro, Colin Cherry, George Foster, Marc G. Bellemare

We consider the problem of designing an artificial agent capable of interacting with humans in collaborative dialogue to produce creative, engaging narratives.

Specificity

A Comparative Analysis of Expected and Distributional Reinforcement Learning

no code implementations30 Jan 2019 Clare Lyle, Pablo Samuel Castro, Marc G. Bellemare

Since their introduction a year ago, distributional approaches to reinforcement learning (distributional RL) have produced strong results relative to the standard approach which models expected values (expected RL).

Distributional Reinforcement Learning reinforcement-learning +1

An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents

1 code implementation17 Dec 2018 Felipe Petroski Such, Vashisht Madhavan, Rosanne Liu, Rui Wang, Pablo Samuel Castro, Yulun Li, Jiale Zhi, Ludwig Schubert, Marc G. Bellemare, Jeff Clune, Joel Lehman

We lessen this friction, by (1) training several algorithms at scale and releasing trained models, (2) integrating with a previous Deep RL model release, and (3) releasing code that makes it easy for anyone to load, visualize, and analyze such models.

Atari Games Friction +2

Combining Learned Lyrical Structures and Vocabulary for Improved Lyric Generation

no code implementations12 Nov 2018 Pablo Samuel Castro, Maria Attarian

The use of language models for generating lyrics and poetry has received an increased interest in the last few years.

Diversity

Cannot find the paper you are looking for? You can Submit a new open access paper.