no code implementations • 11 Dec 2024 • Martin Klissarov, Mikael Henaff, Roberta Raileanu, Shagun Sodhani, Pascal Vincent, Amy Zhang, Pierre-Luc Bacon, Doina Precup, Marlos C. Machado, Pierluca D'Oro
Describing skills in natural language has the potential to provide an accessible way to inject human knowledge about decision-making into an AI system.
no code implementations • 27 Oct 2024 • Alex Lewandowski, Dale Schuurmans, Marlos C. Machado
We then propose deep Fourier features, which are the concatenation of a sine and cosine in every layer, and we show that this combination provides a dynamic balance between the trainability obtained through linearity and the effectiveness obtained through the nonlinearity of neural networks.
1 code implementation • 18 Jun 2024 • Brett Daley, Marlos C. Machado, Martha White
The recency heuristic in reinforcement learning is the assumption that stimuli that occurred closer in time to an acquired reward should be more heavily reinforced.
no code implementations • 10 Jun 2024 • Alex Lewandowski, Michał Bortkiewicz, Saurabh Kumar, András György, Dale Schuurmans, Mateusz Ostaszewski, Marlos C. Machado
From this perspective, we derive a new spectral regularizer for continual learning that better sustains these beneficial initialization properties throughout training.
no code implementations • 6 Feb 2024 • Brett Daley, Martha White, Marlos C. Machado
Multistep returns, such as $n$-step returns and $\lambda$-returns, are commonly used to improve the sample efficiency of reinforcement learning (RL) methods.
no code implementations • 4 Dec 2023 • Muhammad Kamran Janjua, Haseeb Shah, Martha White, Erfan Miahi, Marlos C. Machado, Adam White
In this paper we investigate the use of reinforcement-learning based prediction approaches for a real drinking-water treatment plant.
1 code implementation • 2 Dec 2023 • Edan Meyer, Adam White, Marlos C. Machado
In this work, we provide a thorough empirical investigation of the advantages of representing observations as vectors of categorical values within the context of reinforcement learning.
no code implementations • 30 Nov 2023 • Alex Lewandowski, Haruto Tanaka, Dale Schuurmans, Marlos C. Machado
Loss of plasticity is a phenomenon in which neural networks lose their ability to learn from new experience.
2 code implementations • 24 Oct 2023 • Subhojeet Pramanik, Esraa Elelimy, Marlos C. Machado, Adam White
In this paper, we introduce recurrent alternatives to the transformer self-attention mechanism that offer context-independent inference cost, leverage long-range dependencies effectively, and performs well in online reinforcement learning task.
2 code implementations • 16 Oct 2023 • Diego Gomez, Michael Bowling, Marlos C. Machado
The ability to learn good representations of states is essential for solving large reinforcement learning problems, where exploration, generalization, and transfer are particularly challenging.
no code implementations • 13 Mar 2023 • Zaheer Abbas, Rosie Zhao, Joseph Modayil, Adam White, Marlos C. Machado
The ability to learn continually is essential in a complex and changing world.
1 code implementation • 26 Jan 2023 • Martin Klissarov, Marlos C. Machado
In this paper we address these limitations and show how recent results for directly approximating the eigenfunctions of the Laplacian can be leveraged to truly scale up options-based exploration.
1 code implementation • 26 Jan 2023 • Brett Daley, Martha White, Christopher Amato, Marlos C. Machado
Off-policy learning from multistep returns is crucial for sample-efficient reinforcement learning, but counteracting off-policy bias without exacerbating variance is challenging.
1 code implementation • 15 Nov 2022 • Ruo Yu Tao, Adam White, Marlos C. Machado
Finally, we show that this approach is complementary to state-of-the-art methods such as recurrent neural networks and truncated back-propagation through time, and acts as a heuristic that facilitates longer temporal credit assignment, leading to better performance.
no code implementations • 30 Mar 2022 • Han Wang, Erfan Miahi, Martha White, Marlos C. Machado, Zaheer Abbas, Raksha Kumaraswamy, Vincent Liu, Adam White
In this paper we investigate the properties of representations learned by deep reinforcement learning systems.
no code implementations • 21 Mar 2022 • Akram Erraqabi, Marlos C. Machado, Mingde Zhao, Sainbayar Sukhbaatar, Alessandro Lazaric, Ludovic Denoyer, Yoshua Bengio
In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting, with applications ranging from skill discovery to reward shaping.
no code implementations • 7 Feb 2022 • Richard S. Sutton, Marlos C. Machado, G. Zacharias Holland, David Szepesvari, Finbarr Timbers, Brian Tanner, Adam White
Each subtask is solved to produce an option, and then a model of the option is learned and made available to the planning process.
Model-based Reinforcement Learning
reinforcement-learning
+2
no code implementations • 12 Oct 2021 • Marlos C. Machado, Andre Barreto, Doina Precup, Michael Bowling
In this paper, we argue that the successor representation (SR), which encodes states based on the pattern of state visitation that follows them, can be seen as a natural substrate for the discovery and use of temporal abstractions.
no code implementations • 22 Sep 2021 • Adrien Ali Taïga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare
Research on exploration in reinforcement learning, as applied to Atari 2600 game-playing, has emphasized tackling difficult exploration problems such as Montezuma's Revenge (Bellemare et al., 2016).
1 code implementation • 12 Aug 2021 • Sharan Vaswani, Olivier Bachem, Simone Totaro, Robert Mueller, Shivam Garg, Matthieu Geist, Marlos C. Machado, Pablo Samuel Castro, Nicolas Le Roux
Common policy gradient methods rely on the maximization of a sequence of surrogate functions.
no code implementations • ICML Workshop URL 2021 • Akram Erraqabi, Mingde Zhao, Marlos C. Machado, Yoshua Bengio, Sainbayar Sukhbaatar, Ludovic Denoyer, Alessandro Lazaric
In this work, we introduce a method that explicitly couples representation learning with exploration when the agent is not provided with a uniform prior over the state space.
1 code implementation • ICLR 2021 • Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G. Bellemare
Specifically, we introduce a theoretically motivated policy similarity metric (PSM) for measuring behavioral similarity between states.
no code implementations • 31 Aug 2020 • Wesley Chung, Valentin Thomas, Marlos C. Machado, Nicolas Le Roux
Traditionally, stochastic optimization theory predicts that learning dynamics are governed by the curvature of the loss function and the noise of the gradient estimates.
no code implementations • NeurIPS 2020 • Dibya Ghosh, Marlos C. Machado, Nicolas Le Roux
We cast policy gradient methods as the repeated application of two operators: a policy improvement operator $\mathcal{I}$, which maps any policy $\pi$ to a better one $\mathcal{I}\pi$, and a projection operator $\mathcal{P}$, which finds the best approximation of $\mathcal{I}\pi$ in the set of realizable policies.
no code implementations • ICLR 2020 • Yuu Jinnai, Jee Won Park, Marlos C. Machado, George Konidaris
While many option discovery methods have been proposed to accelerate exploration in reinforcement learning, they are often heuristic.
no code implementations • ICLR 2020 • Adrien Ali Taiga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare
Research on exploration in reinforcement learning, as applied to Atari 2600 game-playing, has emphasized tackling difficult exploration problems such as Montezuma's Revenge (Bellemare et al., 2016).
no code implementations • 6 Aug 2019 • Adrien Ali Taïga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare
This paper provides an empirical evaluation of recently developed exploration algorithms within the Arcade Learning Environment (ALE).
1 code implementation • 29 Sep 2018 • Jesse Farebrother, Marlos C. Machado, Michael Bowling
Deep reinforcement learning algorithms have shown an impressive ability to learn complex control policies in high-dimensional tasks.
2 code implementations • ICLR 2019 • Marlos C. Machado, Marc G. Bellemare, Michael Bowling
In this paper we introduce a simple approach for exploration in reinforcement learning (RL) that allows us to develop theoretically justified algorithms in the tabular case but that is also extendable to settings where function approximation is required.
Ranked #16 on
Atari Games
on Atari 2600 Venture
no code implementations • 23 Mar 2018 • Craig Sherstan, Marlos C. Machado, Patrick M. Pilarski
As a primary contribution of this work, we show that using SR-based predictions can improve sample efficiency and learning speed in a continual learning setting where new predictions are incrementally added and learned over time.
no code implementations • 11 Dec 2017 • Miao Liu, Marlos C. Machado, Gerald Tesauro, Murray Campbell
Eigenoptions (EOs) have been recently introduced as a promising idea for generating a diverse set of options through the graph Laplacian, having been shown to allow efficient exploration.
Efficient Exploration
Hierarchical Reinforcement Learning
+2
1 code implementation • ICLR 2018 • Marlos C. Machado, Clemens Rosenbaum, Xiaoxiao Guo, Miao Liu, Gerald Tesauro, Murray Campbell
Options in reinforcement learning allow agents to hierarchically decompose a task into subtasks, having the potential to speed up learning and planning.
7 code implementations • 18 Sep 2017 • Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew Hausknecht, Michael Bowling
The Arcade Learning Environment (ALE) is an evaluation platform that poses the challenge of building AI agents with general competency across dozens of Atari 2600 games.
1 code implementation • ICML 2017 • Marlos C. Machado, Marc G. Bellemare, Michael Bowling
Representation learning and option discovery are two of the biggest challenges in reinforcement learning (RL).
no code implementations • 17 Jun 2016 • Craig Sherstan, Adam White, Marlos C. Machado, Patrick M. Pilarski
Agents of general intelligence deployed in real-world scenarios must adapt to ever-changing environmental conditions.
no code implementations • 25 May 2016 • Marlos C. Machado, Michael Bowling
In the reinforcement learning framework, goals are encoded as reward functions that guide agent behaviour, and the sum of observed rewards provide a notion of progress.
1 code implementation • 13 Dec 2015 • Harm van Seijen, A. Rupam Mahmood, Patrick M. Pilarski, Marlos C. Machado, Richard S. Sutton
Our results suggest that the true online methods indeed dominate the regular methods.
1 code implementation • 4 Dec 2015 • Yitao Liang, Marlos C. Machado, Erik Talvitie, Michael Bowling
The recently introduced Deep Q-Networks (DQN) algorithm has gained attention as one of the first successful combinations of deep neural networks and reinforcement learning.
no code implementations • 16 Oct 2014 • Marlos C. Machado, Sriram Srinivasan, Michael Bowling
In Reinforcement Learning (RL), it is common to use optimistic initialization of value functions to encourage exploration.
no code implementations • 13 Dec 2013 • Marlos C. Machado
We also presented a generic approach to deal with player modeling using ML, and we instantiated this approach to model players' preferences in the game Civilization IV.