Search Results for author: Marlos C. Machado

Found 32 papers, 14 papers with code

Recurrent Linear Transformers

1 code implementation24 Oct 2023 Subhojeet Pramanik, Esraa Elelimy, Marlos C. Machado, Adam White

In this paper we introduce recurrent alternatives to the transformer self-attention mechanism that offer a context-independent inference cost, leverage long-range dependencies effectively, and perform well in practice.

Proper Laplacian Representation Learning

1 code implementation16 Oct 2023 Diego Gomez, Michael Bowling, Marlos C. Machado

The ability to learn good representations of states is essential for solving large reinforcement learning problems, where exploration, generalization, and transfer are particularly challenging.

Representation Learning

Deep Laplacian-based Options for Temporally-Extended Exploration

1 code implementation26 Jan 2023 Martin Klissarov, Marlos C. Machado

In this paper we address these limitations and show how recent results for directly approximating the eigenfunctions of the Laplacian can be leveraged to truly scale up options-based exploration.

Reinforcement Learning (RL)

Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning

1 code implementation26 Jan 2023 Brett Daley, Martha White, Christopher Amato, Marlos C. Machado

Off-policy learning from multistep returns is crucial for sample-efficient reinforcement learning, but counteracting off-policy bias without exacerbating variance is challenging.

reinforcement-learning Reinforcement Learning (RL)

Agent-State Construction with Auxiliary Inputs

1 code implementation15 Nov 2022 Ruo Yu Tao, Adam White, Marlos C. Machado

Finally, we show that this approach is complementary to state-of-the-art methods such as recurrent neural networks and truncated back-propagation through time, and acts as a heuristic that facilitates longer temporal credit assignment, leading to better performance.

Decision Making reinforcement-learning +1

Temporal Abstractions-Augmented Temporally Contrastive Learning: An Alternative to the Laplacian in RL

no code implementations21 Mar 2022 Akram Erraqabi, Marlos C. Machado, Mingde Zhao, Sainbayar Sukhbaatar, Alessandro Lazaric, Ludovic Denoyer, Yoshua Bengio

In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting, with applications ranging from skill discovery to reward shaping.

Continuous Control Contrastive Learning +1

Temporal Abstraction in Reinforcement Learning with the Successor Representation

no code implementations12 Oct 2021 Marlos C. Machado, Andre Barreto, Doina Precup, Michael Bowling

In this paper, we argue that the successor representation (SR), which encodes states based on the pattern of state visitation that follows them, can be seen as a natural substrate for the discovery and use of temporal abstractions.

reinforcement-learning Reinforcement Learning (RL)

On Bonus-Based Exploration Methods in the Arcade Learning Environment

no code implementations22 Sep 2021 Adrien Ali Taïga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare

Research on exploration in reinforcement learning, as applied to Atari 2600 game-playing, has emphasized tackling difficult exploration problems such as Montezuma's Revenge (Bellemare et al., 2016).

Montezuma's Revenge

Beyond variance reduction: Understanding the true impact of baselines on policy optimization

no code implementations31 Aug 2020 Wesley Chung, Valentin Thomas, Marlos C. Machado, Nicolas Le Roux

Traditionally, stochastic optimization theory predicts that learning dynamics are governed by the curvature of the loss function and the noise of the gradient estimates.

Reinforcement Learning (RL) Stochastic Optimization

An operator view of policy gradient methods

no code implementations NeurIPS 2020 Dibya Ghosh, Marlos C. Machado, Nicolas Le Roux

We cast policy gradient methods as the repeated application of two operators: a policy improvement operator $\mathcal{I}$, which maps any policy $\pi$ to a better one $\mathcal{I}\pi$, and a projection operator $\mathcal{P}$, which finds the best approximation of $\mathcal{I}\pi$ in the set of realizable policies.

Policy Gradient Methods

On Bonus Based Exploration Methods In The Arcade Learning Environment

no code implementations ICLR 2020 Adrien Ali Taiga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare

Research on exploration in reinforcement learning, as applied to Atari 2600 game-playing, has emphasized tackling difficult exploration problems such as Montezuma's Revenge (Bellemare et al., 2016).

Montezuma's Revenge

Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment

no code implementations6 Aug 2019 Adrien Ali Taïga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare

This paper provides an empirical evaluation of recently developed exploration algorithms within the Arcade Learning Environment (ALE).

Benchmarking Montezuma's Revenge

Generalization and Regularization in DQN

1 code implementation29 Sep 2018 Jesse Farebrother, Marlos C. Machado, Michael Bowling

Deep reinforcement learning algorithms have shown an impressive ability to learn complex control policies in high-dimensional tasks.

Atari Games Benchmarking +2

Count-Based Exploration with the Successor Representation

2 code implementations ICLR 2019 Marlos C. Machado, Marc G. Bellemare, Michael Bowling

In this paper we introduce a simple approach for exploration in reinforcement learning (RL) that allows us to develop theoretically justified algorithms in the tabular case but that is also extendable to settings where function approximation is required.

Atari Games Efficient Exploration +1

Accelerating Learning in Constructive Predictive Frameworks with the Successor Representation

no code implementations23 Mar 2018 Craig Sherstan, Marlos C. Machado, Patrick M. Pilarski

As a primary contribution of this work, we show that using SR-based predictions can improve sample efficiency and learning speed in a continual learning setting where new predictions are incrementally added and learned over time.

Continual Learning Reinforcement Learning (RL)

The Eigenoption-Critic Framework

no code implementations11 Dec 2017 Miao Liu, Marlos C. Machado, Gerald Tesauro, Murray Campbell

Eigenoptions (EOs) have been recently introduced as a promising idea for generating a diverse set of options through the graph Laplacian, having been shown to allow efficient exploration.

Efficient Exploration Hierarchical Reinforcement Learning +1

Eigenoption Discovery through the Deep Successor Representation

1 code implementation ICLR 2018 Marlos C. Machado, Clemens Rosenbaum, Xiaoxiao Guo, Miao Liu, Gerald Tesauro, Murray Campbell

Options in reinforcement learning allow agents to hierarchically decompose a task into subtasks, having the potential to speed up learning and planning.

Atari Games reinforcement-learning +2

Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents

7 code implementations18 Sep 2017 Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew Hausknecht, Michael Bowling

The Arcade Learning Environment (ALE) is an evaluation platform that poses the challenge of building AI agents with general competency across dozens of Atari 2600 games.

Atari Games

Introspective Agents: Confidence Measures for General Value Functions

no code implementations17 Jun 2016 Craig Sherstan, Adam White, Marlos C. Machado, Patrick M. Pilarski

Agents of general intelligence deployed in real-world scenarios must adapt to ever-changing environmental conditions.

Learning Purposeful Behaviour in the Absence of Rewards

no code implementations25 May 2016 Marlos C. Machado, Michael Bowling

In the reinforcement learning framework, goals are encoded as reward functions that guide agent behaviour, and the sum of observed rewards provide a notion of progress.

True Online Temporal-Difference Learning

1 code implementation13 Dec 2015 Harm van Seijen, A. Rupam Mahmood, Patrick M. Pilarski, Marlos C. Machado, Richard S. Sutton

Our results suggest that the true online methods indeed dominate the regular methods.

Atari Games

State of the Art Control of Atari Games Using Shallow Reinforcement Learning

1 code implementation4 Dec 2015 Yitao Liang, Marlos C. Machado, Erik Talvitie, Michael Bowling

The recently introduced Deep Q-Networks (DQN) algorithm has gained attention as one of the first successful combinations of deep neural networks and reinforcement learning.

Atari Games reinforcement-learning +1

Domain-Independent Optimistic Initialization for Reinforcement Learning

no code implementations16 Oct 2014 Marlos C. Machado, Sriram Srinivasan, Michael Bowling

In Reinforcement Learning (RL), it is common to use optimistic initialization of value functions to encourage exploration.

reinforcement-learning Reinforcement Learning (RL)

A Methodology for Player Modeling based on Machine Learning

no code implementations13 Dec 2013 Marlos C. Machado

We also presented a generic approach to deal with player modeling using ML, and we instantiated this approach to model players' preferences in the game Civilization IV.

BIG-bench Machine Learning Binary Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.