You need to log in to edit.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

1 code implementation • 13 Feb 2024 • Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Yunhao Tang, André Barreto, Will Dabney, Marc G. Bellemare, Mark Rowland

This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process.

Distributional Reinforcement Learning
Model-based Reinforcement Learning
**+1**

1 code implementation • 21 Nov 2023 • Max Schwarzer, Jesse Farebrother, Joshua Greaves, Ekin Dogus Cubuk, Rishabh Agarwal, Aaron Courville, Marc G. Bellemare, Sergei Kalinin, Igor Mordatch, Pablo Samuel Castro, Kevin M. Roccapriore

We introduce a machine learning approach to determine the transition dynamics of silicon atoms on a single layer of carbon atoms, when stimulated by the electron beam of a scanning transmission electron microscope (STEM).

no code implementations • 16 Jun 2023 • Charline Le Lan, Stephen Tu, Mark Rowland, Anna Harutyunyan, Rishabh Agarwal, Marc G. Bellemare, Will Dabney

In this paper, we address this gap and provide a theoretical characterization of the state representation learnt by temporal difference learning (Sutton, 1988).

no code implementations • 28 May 2023 • Mark Rowland, Yunhao Tang, Clare Lyle, Rémi Munos, Marc G. Bellemare, Will Dabney

We study the problem of temporal-difference-based policy evaluation in reinforcement learning.

Distributional Reinforcement Learning reinforcement-learning

1 code implementation • 25 Apr 2023 • Jesse Farebrother, Joshua Greaves, Rishabh Agarwal, Charline Le Lan, Ross Goroshin, Pablo Samuel Castro, Marc G. Bellemare

Combined with a suitable off-policy learning rule, the result is a representation learning algorithm that can be understood as extending Mahadevan & Maggioni (2007)'s proto-value functions to deep reinforcement learning -- accordingly, we call the resulting object proto-value networks.

no code implementations • 11 Jan 2023 • Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney

We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning.

Distributional Reinforcement Learning
reinforcement-learning
**+1**

no code implementations • 8 Dec 2022 • Charline Le Lan, Joshua Greaves, Jesse Farebrother, Mark Rowland, Fabian Pedregosa, Rishabh Agarwal, Marc G. Bellemare

In this paper, we derive an algorithm that learns a principal subspace from sample entries, can be applied when the approximate subspace is represented by a neural network, and hence can be scaled to datasets with an effectively infinite number of rows and columns.

no code implementations • 15 Jul 2022 • Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney, Marc G. Bellemare

We study the multi-step off-policy learning approach to distributional RL.

Distributional Reinforcement Learning
reinforcement-learning
**+1**

1 code implementation • 3 Jun 2022 • Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare

To address these issues, we present reincarnating RL as an alternative workflow or class of problem settings, where prior computational work (e. g., learned policies) is reused or transferred between design iterations of an RL agent, or from one RL agent to another.

no code implementations • 24 May 2022 • Harley Wiltzer, David Meger, Marc G. Bellemare

We demonstrate the effectiveness of such an algorithm in a synthetic control problem.

1 code implementation • 1 Mar 2022 • Charline Le Lan, Stephen Tu, Adam Oberman, Rishabh Agarwal, Marc G. Bellemare

We complement our theoretical results with an empirical survey of classic representation learning methods from the literature and results on the Arcade Learning Environment, and find that the generalization behaviour of learned representations is well-explained by their effective dimension.

no code implementations • 22 Sep 2021 • Adrien Ali Taïga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare

Research on exploration in reinforcement learning, as applied to Atari 2600 game-playing, has emphasized tackling difficult exploration problems such as Montezuma's Revenge (Bellemare et al., 2016).

3 code implementations • NeurIPS 2021 • Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare

Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs.

2 code implementations • 2 Feb 2021 • Charline Le Lan, Marc G. Bellemare, Pablo Samuel Castro

In most practical applications of reinforcement learning, it is untenable to maintain direct estimates for individual states; in continuous-state systems, it is impossible.

1 code implementation • ICLR 2021 • Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G. Bellemare

Specifically, we introduce a theoretically motivated policy similarity metric (PSM) for measuring behavioral similarity between states.

1 code implementation • ICLR 2021 • Jacob Buckman, Carles Gelada, Marc G. Bellemare

To avoid this, algorithms can follow the pessimism principle, which states that we should choose the policy which acts optimally in the worst possible world.

no code implementations • ICML 2020 • Dibya Ghosh, Marc G. Bellemare

Reinforcement learning with function approximation can be unstable and even divergent, especially when combined with off-policy learning and Bellman updates.

no code implementations • 3 Jun 2020 • Will Dabney, André Barreto, Mark Rowland, Robert Dadashi, John Quan, Marc G. Bellemare, David Silver

To test our hypothesis empirically, we augmented a standard deep RL agent with an auxiliary task of learning the value-improvement path.

no code implementations • 27 Mar 2020 • Philip Amortila, Doina Precup, Prakash Panangaden, Marc G. Bellemare

We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes.

no code implementations • 9 Mar 2020 • Ahmed Touati, Adrien Ali Taiga, Marc G. Bellemare

Despite the wealth of research into provably efficient reinforcement learning algorithms, most works focus on tabular representation and thus struggle to handle exponentially or infinitely large state-action spaces.

1 code implementation • 28 Feb 2020 • William Fedus, Dibya Ghosh, John D. Martin, Marc G. Bellemare, Yoshua Bengio, Hugo Larochelle

Our study provides a clear empirical link between catastrophic interference and sample efficiency in reinforcement learning.

no code implementations • ICLR 2020 • Adrien Ali Taiga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare

Research on exploration in reinforcement learning, as applied to Atari 2600 game-playing, has emphasized tackling difficult exploration problems such as Montezuma's Revenge (Bellemare et al., 2016).

no code implementations • 28 Nov 2019 • Vishal Jain, William Fedus, Hugo Larochelle, Doina Precup, Marc G. Bellemare

Empirically, we find that these techniques improve the performance of a baseline deep reinforcement learning agent applied to text-based games.

no code implementations • 6 Aug 2019 • Adrien Ali Taïga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare

This paper provides an empirical evaluation of recently developed exploration algorithms within the Arcade Learning Environment (ALE).

no code implementations • 6 Jun 2019 • Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, Marc G. Bellemare

We show that the optimization of these objectives guarantees (1) the quality of the latent space as a representation of the state space and (2) the quality of the DeepMDP as a model of the environment.

no code implementations • 21 Feb 2019 • Mark Rowland, Robert Dadashi, Saurabh Kumar, Rémi Munos, Marc G. Bellemare, Will Dabney

We present a unifying framework for designing and analysing distributional reinforcement learning (DRL) algorithms in terms of recursively estimating statistics of the return distribution.

Distributional Reinforcement Learning
reinforcement-learning
**+1**

1 code implementation • ICLR 2020 • William Fedus, Carles Gelada, Yoshua Bengio, Marc G. Bellemare, Hugo Larochelle

Reinforcement learning (RL) typically defines a discount factor as part of the Markov Decision Process.

no code implementations • 8 Feb 2019 • Marc G. Bellemare, Nicolas Le Roux, Pablo Samuel Castro, Subhodeep Moitra

Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited.

Distributional Reinforcement Learning
reinforcement-learning
**+1**

1 code implementation • 1 Feb 2019 • Nolan Bard, Jakob N. Foerster, Sarath Chandar, Neil Burch, Marc Lanctot, H. Francis Song, Emilio Parisotto, Vincent Dumoulin, Subhodeep Moitra, Edward Hughes, Iain Dunning, Shibl Mourad, Hugo Larochelle, Marc G. Bellemare, Michael Bowling

From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making.

no code implementations • 31 Jan 2019 • Robert Dadashi, Adrien Ali Taïga, Nicolas Le Roux, Dale Schuurmans, Marc G. Bellemare

We establish geometric and topological properties of the space of value functions in finite state-action Markov decision processes.

no code implementations • NeurIPS 2019 • Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle

We leverage this perspective to provide formal evidence regarding the usefulness of value functions as auxiliary tasks.

no code implementations • 31 Jan 2019 • Kory W. Mathewson, Pablo Samuel Castro, Colin Cherry, George Foster, Marc G. Bellemare

We consider the problem of designing an artificial agent capable of interacting with humans in collaborative dialogue to produce creative, engaging narratives.

no code implementations • 30 Jan 2019 • Clare Lyle, Pablo Samuel Castro, Marc G. Bellemare

Since their introduction a year ago, distributional approaches to reinforcement learning (distributional RL) have produced strong results relative to the standard approach which models expected values (expected RL).

Distributional Reinforcement Learning
reinforcement-learning
**+1**

no code implementations • 27 Jan 2019 • Carles Gelada, Marc G. Bellemare

We complement our analysis with an empirical evaluation of the two techniques in an off-policy setting on the game Pong from the Atari domain where we find discounted COP-TD to be better behaved in practice than the soft normalization penalty.

1 code implementation • 17 Dec 2018 • Felipe Petroski Such, Vashisht Madhavan, Rosanne Liu, Rui Wang, Pablo Samuel Castro, Yulun Li, Jiale Zhi, Ludwig Schubert, Marc G. Bellemare, Jeff Clune, Joel Lehman

We lessen this friction, by (1) training several algorithms at scale and releasing trained models, (2) integrating with a previous Deep RL model release, and (3) releasing code that makes it easy for anyone to load, visualize, and analyze such models.

12 code implementations • 14 Dec 2018 • Pablo Samuel Castro, Subhodeep Moitra, Carles Gelada, Saurabh Kumar, Marc G. Bellemare

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

3 code implementations • 30 Nov 2018 • Vincent Francois-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare, Joelle Pineau

Deep reinforcement learning is the combination of reinforcement learning (RL) and deep learning.

no code implementations • 29 Aug 2018 • Adrien Ali Taïga, Aaron Courville, Marc G. Bellemare

Next, we show how a given density model can be related to an abstraction and that the corresponding pseudo-count bonus can act as a substitute in MBIE-EB combined with this abstraction, but may lead to either under- or over-exploration.

2 code implementations • ICLR 2019 • Marlos C. Machado, Marc G. Bellemare, Michael Bowling

In this paper we introduce a simple approach for exploration in reinforcement learning (RL) that allows us to develop theoretically justified algorithms in the tabular case but that is also extendable to settings where function approximation is required.

Ranked #16 on Atari Games on Atari 2600 Venture

no code implementations • 22 Feb 2018 • Mark Rowland, Marc G. Bellemare, Will Dabney, Rémi Munos, Yee Whye Teh

Distributional approaches to value-based reinforcement learning model the entire distribution of returns, rather than just their expected values, and have recently been shown to yield state-of-the-art empirical performance.

Distributional Reinforcement Learning
reinforcement-learning
**+1**

17 code implementations • 27 Oct 2017 • Will Dabney, Mark Rowland, Marc G. Bellemare, Rémi Munos

In this paper, we build on recent work advocating a distributional approach to reinforcement learning in which the distribution over returns is modeled explicitly instead of only estimating the mean.

Ranked #1 on Atari Games on Atari 2600 Pong

7 code implementations • 18 Sep 2017 • Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew Hausknecht, Michael Bowling

The Arcade Learning Environment (ALE) is an evaluation platform that poses the challenge of building AI agents with general competency across dozens of Atari 2600 games.

22 code implementations • ICML 2017 • Marc G. Bellemare, Will Dabney, Rémi Munos

We obtain both state-of-the-art results and anecdotal evidence demonstrating the importance of the value distribution in approximate reinforcement learning.

Ranked #4 on Atari Games on Atari 2600 HERO

2 code implementations • ICLR 2018 • Marc G. Bellemare, Ivo Danihelka, Will Dabney, Shakir Mohamed, Balaji Lakshminarayanan, Stephan Hoyer, Rémi Munos

We show that the Cram\'er distance possesses all three desired properties, combining the best of the Wasserstein and Kullback-Leibler divergences.

no code implementations • ICML 2017 • Alex Graves, Marc G. Bellemare, Jacob Menick, Remi Munos, Koray Kavukcuoglu

We introduce a method for automatically selecting the path, or syllabus, that a neural network follows through a curriculum so as to maximise learning efficiency.

1 code implementation • ICML 2017 • Georg Ostrovski, Marc G. Bellemare, Aaron van den Oord, Remi Munos

This pseudo-count was used to generate an exploration bonus for a DQN agent and combined with a mixed Monte Carlo update was sufficient to achieve state of the art on the Atari 2600 game Montezuma's Revenge.

Ranked #9 on Atari Games on Atari 2600 Montezuma's Revenge

1 code implementation • ICML 2017 • Marlos C. Machado, Marc G. Bellemare, Michael Bowling

Representation learning and option discovery are two of the biggest challenges in reinforcement learning (RL).

3 code implementations • NeurIPS 2016 • Rémi Munos, Tom Stepleton, Anna Harutyunyan, Marc G. Bellemare

In this work, we take a fresh look at some old and new algorithms for off-policy, return-based reinforcement learning.

1 code implementation • NeurIPS 2016 • Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Remi Munos

We consider an agent's uncertainty about its environment and the problem of generalizing this uncertainty across observations.

Ranked #10 on Atari Games on Atari 2600 Montezuma's Revenge

no code implementations • 16 Feb 2016 • Anna Harutyunyan, Marc G. Bellemare, Tom Stepleton, Remi Munos

We propose and analyze an alternate approach to off-policy multi-step temporal difference learning, in which off-policy returns are corrected with the current Q-function in terms of rewards, rather than with the target policy in terms of transition probabilities.

2 code implementations • 15 Dec 2015 • Marc G. Bellemare, Georg Ostrovski, Arthur Guez, Philip S. Thomas, Rémi Munos

Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman operator.

Ranked #1 on Atari Games on Atari 2600 Elevator Action

7 code implementations • 25 Feb 2015 • Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg1 & Demis Hassabis

We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters.

no code implementations • 19 Nov 2014 • Joel Veness, Marc G. Bellemare, Marcus Hutter, Alvin Chua, Guillaume Desjardins

This paper describes a new information-theoretic policy evaluation technique for reinforcement learning.

3 code implementations • 19 Jul 2012 • Marc G. Bellemare, Yavar Naddaf, Joel Veness, Michael Bowling

We illustrate the promise of ALE by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning.

Ranked #1 on Atari Games on Atari 2600 Pooyan

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.