no code implementations • 18 Mar 2025 • Nicolas Le Roux, Marc G. Bellemare, Jonathan Lebensold, Arnaud Bergeron, Joshua Greaves, Alex Fréchette, Carolyne Pelletier, Eric Thibodeau-Laufer Sándor Toth, Samantha Work
As a corollary to this work, we find that REINFORCE's baseline parameter plays an important and unexpected role in defining dataset composition in the presence of negative examples, and is consequently critical in driving off-policy performance.
no code implementations • 14 Oct 2024 • Harley Wiltzer, Marc G. Bellemare, David Meger, Patrick Shafto, Yash Jhaveri
Whether the performance of distributional RL (DRL) agents suffers similarly, however, is unknown.
Distributional Reinforcement Learning
reinforcement-learning
+2
no code implementations • 1 Jun 2024 • Nate Rahn, Pierluca D'Oro, Marc G. Bellemare
But how do LLM agents explore, and how can we control their exploratory behaviors?
1 code implementation • 13 Feb 2024 • Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Yunhao Tang, André Barreto, Will Dabney, Marc G. Bellemare, Mark Rowland
This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process.
Distributional Reinforcement Learning
Model-based Reinforcement Learning
+2
1 code implementation • 21 Nov 2023 • Max Schwarzer, Jesse Farebrother, Joshua Greaves, Ekin Dogus Cubuk, Rishabh Agarwal, Aaron Courville, Marc G. Bellemare, Sergei Kalinin, Igor Mordatch, Pablo Samuel Castro, Kevin M. Roccapriore
We introduce a machine learning approach to determine the transition dynamics of silicon atoms on a single layer of carbon atoms, when stimulated by the electron beam of a scanning transmission electron microscope (STEM).
1 code implementation • NeurIPS 2023 • Nate Rahn, Pierluca D'Oro, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare
To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve the robustness of a policy.
no code implementations • 16 Jun 2023 • Charline Le Lan, Stephen Tu, Mark Rowland, Anna Harutyunyan, Rishabh Agarwal, Marc G. Bellemare, Will Dabney
In this paper, we address this gap and provide a theoretical characterization of the state representation learnt by temporal difference learning (Sutton, 1988).
no code implementations • 28 May 2023 • Mark Rowland, Yunhao Tang, Clare Lyle, Rémi Munos, Marc G. Bellemare, Will Dabney
We study the problem of temporal-difference-based policy evaluation in reinforcement learning.
Distributional Reinforcement Learning
reinforcement-learning
+1
1 code implementation • 25 Apr 2023 • Jesse Farebrother, Joshua Greaves, Rishabh Agarwal, Charline Le Lan, Ross Goroshin, Pablo Samuel Castro, Marc G. Bellemare
Combined with a suitable off-policy learning rule, the result is a representation learning algorithm that can be understood as extending Mahadevan & Maggioni (2007)'s proto-value functions to deep reinforcement learning -- accordingly, we call the resulting object proto-value networks.
no code implementations • 11 Jan 2023 • Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney
We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning.
Distributional Reinforcement Learning
reinforcement-learning
+2
no code implementations • 8 Dec 2022 • Charline Le Lan, Joshua Greaves, Jesse Farebrother, Mark Rowland, Fabian Pedregosa, Rishabh Agarwal, Marc G. Bellemare
In this paper, we derive an algorithm that learns a principal subspace from sample entries, can be applied when the approximate subspace is represented by a neural network, and hence can be scaled to datasets with an effectively infinite number of rows and columns.
no code implementations • 15 Jul 2022 • Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney, Marc G. Bellemare
We study the multi-step off-policy learning approach to distributional RL.
Distributional Reinforcement Learning
quantile regression
+2
1 code implementation • 3 Jun 2022 • Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare
To address these issues, we present reincarnating RL as an alternative workflow or class of problem settings, where prior computational work (e. g., learned policies) is reused or transferred between design iterations of an RL agent, or from one RL agent to another.
no code implementations • 24 May 2022 • Harley Wiltzer, David Meger, Marc G. Bellemare
We demonstrate the effectiveness of such an algorithm in a synthetic control problem.
1 code implementation • 1 Mar 2022 • Charline Le Lan, Stephen Tu, Adam Oberman, Rishabh Agarwal, Marc G. Bellemare
We complement our theoretical results with an empirical survey of classic representation learning methods from the literature and results on the Arcade Learning Environment, and find that the generalization behaviour of learned representations is well-explained by their effective dimension.
no code implementations • 22 Sep 2021 • Adrien Ali Taïga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare
Research on exploration in reinforcement learning, as applied to Atari 2600 game-playing, has emphasized tackling difficult exploration problems such as Montezuma's Revenge (Bellemare et al., 2016).
3 code implementations • NeurIPS 2021 • Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare
Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs.
2 code implementations • 2 Feb 2021 • Charline Le Lan, Marc G. Bellemare, Pablo Samuel Castro
In most practical applications of reinforcement learning, it is untenable to maintain direct estimates for individual states; in continuous-state systems, it is impossible.
1 code implementation • ICLR 2021 • Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G. Bellemare
Specifically, we introduce a theoretically motivated policy similarity metric (PSM) for measuring behavioral similarity between states.
1 code implementation • ICLR 2021 • Jacob Buckman, Carles Gelada, Marc G. Bellemare
To avoid this, algorithms can follow the pessimism principle, which states that we should choose the policy which acts optimally in the worst possible world.
no code implementations • ICML 2020 • Dibya Ghosh, Marc G. Bellemare
Reinforcement learning with function approximation can be unstable and even divergent, especially when combined with off-policy learning and Bellman updates.
no code implementations • 3 Jun 2020 • Will Dabney, André Barreto, Mark Rowland, Robert Dadashi, John Quan, Marc G. Bellemare, David Silver
To test our hypothesis empirically, we augmented a standard deep RL agent with an auxiliary task of learning the value-improvement path.
no code implementations • 27 Mar 2020 • Philip Amortila, Doina Precup, Prakash Panangaden, Marc G. Bellemare
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes.
no code implementations • 9 Mar 2020 • Ahmed Touati, Adrien Ali Taiga, Marc G. Bellemare
Despite the wealth of research into provably efficient reinforcement learning algorithms, most works focus on tabular representation and thus struggle to handle exponentially or infinitely large state-action spaces.
1 code implementation • 28 Feb 2020 • William Fedus, Dibya Ghosh, John D. Martin, Marc G. Bellemare, Yoshua Bengio, Hugo Larochelle
Our study provides a clear empirical link between catastrophic interference and sample efficiency in reinforcement learning.
no code implementations • ICLR 2020 • Adrien Ali Taiga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare
Research on exploration in reinforcement learning, as applied to Atari 2600 game-playing, has emphasized tackling difficult exploration problems such as Montezuma's Revenge (Bellemare et al., 2016).
no code implementations • 28 Nov 2019 • Vishal Jain, William Fedus, Hugo Larochelle, Doina Precup, Marc G. Bellemare
Empirically, we find that these techniques improve the performance of a baseline deep reinforcement learning agent applied to text-based games.
no code implementations • 6 Aug 2019 • Adrien Ali Taïga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare
This paper provides an empirical evaluation of recently developed exploration algorithms within the Arcade Learning Environment (ALE).
no code implementations • 6 Jun 2019 • Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, Marc G. Bellemare
We show that the optimization of these objectives guarantees (1) the quality of the latent space as a representation of the state space and (2) the quality of the DeepMDP as a model of the environment.
no code implementations • 21 Feb 2019 • Mark Rowland, Robert Dadashi, Saurabh Kumar, Rémi Munos, Marc G. Bellemare, Will Dabney
We present a unifying framework for designing and analysing distributional reinforcement learning (DRL) algorithms in terms of recursively estimating statistics of the return distribution.
Distributional Reinforcement Learning
reinforcement-learning
+2
1 code implementation • ICLR 2020 • William Fedus, Carles Gelada, Yoshua Bengio, Marc G. Bellemare, Hugo Larochelle
Reinforcement learning (RL) typically defines a discount factor as part of the Markov Decision Process.
no code implementations • 8 Feb 2019 • Marc G. Bellemare, Nicolas Le Roux, Pablo Samuel Castro, Subhodeep Moitra
Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited.
Distributional Reinforcement Learning
reinforcement-learning
+2
1 code implementation • 1 Feb 2019 • Nolan Bard, Jakob N. Foerster, Sarath Chandar, Neil Burch, Marc Lanctot, H. Francis Song, Emilio Parisotto, Vincent Dumoulin, Subhodeep Moitra, Edward Hughes, Iain Dunning, Shibl Mourad, Hugo Larochelle, Marc G. Bellemare, Michael Bowling
From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making.
no code implementations • 31 Jan 2019 • Robert Dadashi, Adrien Ali Taïga, Nicolas Le Roux, Dale Schuurmans, Marc G. Bellemare
We establish geometric and topological properties of the space of value functions in finite state-action Markov decision processes.
no code implementations • 31 Jan 2019 • Kory W. Mathewson, Pablo Samuel Castro, Colin Cherry, George Foster, Marc G. Bellemare
We consider the problem of designing an artificial agent capable of interacting with humans in collaborative dialogue to produce creative, engaging narratives.
no code implementations • NeurIPS 2019 • Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle
We leverage this perspective to provide formal evidence regarding the usefulness of value functions as auxiliary tasks.
no code implementations • 30 Jan 2019 • Clare Lyle, Pablo Samuel Castro, Marc G. Bellemare
Since their introduction a year ago, distributional approaches to reinforcement learning (distributional RL) have produced strong results relative to the standard approach which models expected values (expected RL).
Distributional Reinforcement Learning
reinforcement-learning
+2
no code implementations • 27 Jan 2019 • Carles Gelada, Marc G. Bellemare
We complement our analysis with an empirical evaluation of the two techniques in an off-policy setting on the game Pong from the Atari domain where we find discounted COP-TD to be better behaved in practice than the soft normalization penalty.
1 code implementation • 17 Dec 2018 • Felipe Petroski Such, Vashisht Madhavan, Rosanne Liu, Rui Wang, Pablo Samuel Castro, Yulun Li, Jiale Zhi, Ludwig Schubert, Marc G. Bellemare, Jeff Clune, Joel Lehman
We lessen this friction, by (1) training several algorithms at scale and releasing trained models, (2) integrating with a previous Deep RL model release, and (3) releasing code that makes it easy for anyone to load, visualize, and analyze such models.
13 code implementations • 14 Dec 2018 • Pablo Samuel Castro, Subhodeep Moitra, Carles Gelada, Saurabh Kumar, Marc G. Bellemare
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.
5 code implementations • 30 Nov 2018 • Vincent Francois-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare, Joelle Pineau
Deep reinforcement learning is the combination of reinforcement learning (RL) and deep learning.
no code implementations • 29 Aug 2018 • Adrien Ali Taïga, Aaron Courville, Marc G. Bellemare
Next, we show how a given density model can be related to an abstraction and that the corresponding pseudo-count bonus can act as a substitute in MBIE-EB combined with this abstraction, but may lead to either under- or over-exploration.
2 code implementations • ICLR 2019 • Marlos C. Machado, Marc G. Bellemare, Michael Bowling
In this paper we introduce a simple approach for exploration in reinforcement learning (RL) that allows us to develop theoretically justified algorithms in the tabular case but that is also extendable to settings where function approximation is required.
Ranked #16 on
Atari Games
on Atari 2600 Venture
no code implementations • 22 Feb 2018 • Mark Rowland, Marc G. Bellemare, Will Dabney, Rémi Munos, Yee Whye Teh
Distributional approaches to value-based reinforcement learning model the entire distribution of returns, rather than just their expected values, and have recently been shown to yield state-of-the-art empirical performance.
Distributional Reinforcement Learning
reinforcement-learning
+2
17 code implementations • 27 Oct 2017 • Will Dabney, Mark Rowland, Marc G. Bellemare, Rémi Munos
In this paper, we build on recent work advocating a distributional approach to reinforcement learning in which the distribution over returns is modeled explicitly instead of only estimating the mean.
Ranked #1 on
Atari Games
on Atari 2600 Pong
7 code implementations • 18 Sep 2017 • Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew Hausknecht, Michael Bowling
The Arcade Learning Environment (ALE) is an evaluation platform that poses the challenge of building AI agents with general competency across dozens of Atari 2600 games.
22 code implementations • ICML 2017 • Marc G. Bellemare, Will Dabney, Rémi Munos
We obtain both state-of-the-art results and anecdotal evidence demonstrating the importance of the value distribution in approximate reinforcement learning.
Ranked #4 on
Atari Games
on Atari 2600 HERO
2 code implementations • ICLR 2018 • Marc G. Bellemare, Ivo Danihelka, Will Dabney, Shakir Mohamed, Balaji Lakshminarayanan, Stephan Hoyer, Rémi Munos
We show that the Cram\'er distance possesses all three desired properties, combining the best of the Wasserstein and Kullback-Leibler divergences.
no code implementations • ICML 2017 • Alex Graves, Marc G. Bellemare, Jacob Menick, Remi Munos, Koray Kavukcuoglu
We introduce a method for automatically selecting the path, or syllabus, that a neural network follows through a curriculum so as to maximise learning efficiency.
1 code implementation • ICML 2017 • Georg Ostrovski, Marc G. Bellemare, Aaron van den Oord, Remi Munos
This pseudo-count was used to generate an exploration bonus for a DQN agent and combined with a mixed Monte Carlo update was sufficient to achieve state of the art on the Atari 2600 game Montezuma's Revenge.
Ranked #9 on
Atari Games
on Atari 2600 Montezuma's Revenge
1 code implementation • ICML 2017 • Marlos C. Machado, Marc G. Bellemare, Michael Bowling
Representation learning and option discovery are two of the biggest challenges in reinforcement learning (RL).
3 code implementations • NeurIPS 2016 • Rémi Munos, Tom Stepleton, Anna Harutyunyan, Marc G. Bellemare
In this work, we take a fresh look at some old and new algorithms for off-policy, return-based reinforcement learning.
1 code implementation • NeurIPS 2016 • Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Remi Munos
We consider an agent's uncertainty about its environment and the problem of generalizing this uncertainty across observations.
Ranked #10 on
Atari Games
on Atari 2600 Montezuma's Revenge
no code implementations • 16 Feb 2016 • Anna Harutyunyan, Marc G. Bellemare, Tom Stepleton, Remi Munos
We propose and analyze an alternate approach to off-policy multi-step temporal difference learning, in which off-policy returns are corrected with the current Q-function in terms of rewards, rather than with the target policy in terms of transition probabilities.
2 code implementations • 15 Dec 2015 • Marc G. Bellemare, Georg Ostrovski, Arthur Guez, Philip S. Thomas, Rémi Munos
Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman operator.
Ranked #1 on
Atari Games
on Atari 2600 Elevator Action
8 code implementations • 25 Feb 2015 • Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg1 & Demis Hassabis
We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters.
Ranked #20 on
Atari Games
on Atari 2600 Ice Hockey
no code implementations • 19 Nov 2014 • Joel Veness, Marc G. Bellemare, Marcus Hutter, Alvin Chua, Guillaume Desjardins
This paper describes a new information-theoretic policy evaluation technique for reinforcement learning.
3 code implementations • 19 Jul 2012 • Marc G. Bellemare, Yavar Naddaf, Joel Veness, Michael Bowling
We illustrate the promise of ALE by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning.
Ranked #1 on
Atari Games
on Atari 2600 Pooyan