Search Results for author: Pierre-Luc Bacon

Found 26 papers, 9 papers with code

The Primacy Bias in Deep Reinforcement Learning

1 code implementation16 May 2022 Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville

This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later.

Atari Games 100k reinforcement-learning

Continuous-Time Meta-Learning with Forward Mode Differentiation

no code implementations ICLR 2022 Tristan Deleu, David Kanaa, Leo Feng, Giancarlo Kerg, Yoshua Bengio, Guillaume Lajoie, Pierre-Luc Bacon

Drawing inspiration from gradient-based meta-learning methods with infinitely small gradient steps, we introduce Continuous-Time Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field.

Few-Shot Image Classification Meta-Learning

Myriad: a real-world testbed to bridge trajectory optimization and deep learning

1 code implementation22 Feb 2022 Nikolaus H. R. Howe, Simon Dufort-Labbé, Nitarshan Rajkumar, Pierre-Luc Bacon

We present Myriad, a testbed written in JAX for learning and planning in real-world continuous environments.

Direct Behavior Specification via Constrained Reinforcement Learning

no code implementations22 Dec 2021 Julien Roy, Roger Girgis, Joshua Romoff, Pierre-Luc Bacon, Christopher Pal

The standard formulation of Reinforcement Learning lacks a practical way of specifying what are admissible and forbidden behaviors.

Continuous Control reinforcement-learning

Neural Algorithmic Reasoners are Implicit Planners

no code implementations NeurIPS 2021 Andreea Deac, Petar Veličković, Ognjen Milinković, Pierre-Luc Bacon, Jian Tang, Mladen Nikolić

We find that prior approaches either assume that the environment is provided in such a tabular form -- which is highly restrictive -- or infer "local neighbourhoods" of states to run value iteration over -- for which we discover an algorithmic bottleneck effect.

Self-Supervised Learning

DistProp: A Scalable Approach to Lagrangian Training via Distributional Approximation

no code implementations29 Sep 2021 Manuel Del Verme, Pierre-Luc Bacon

We develop a multiple shooting method for learning in deep neural networks based on the Lagrangian perspective on automatic differentiation.

Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation

1 code implementation ICML Workshop URL 2021 Evgenii Nikishin, Romina Abachi, Rishabh Agarwal, Pierre-Luc Bacon

The shortcomings of maximum likelihood estimation in the context of model-based reinforcement learning have been highlighted by an increasing number of papers.

Model-based Reinforcement Learning reinforcement-learning

XLVIN: eXecuted Latent Value Iteration Nets

no code implementations25 Oct 2020 Andreea Deac, Petar Veličković, Ognjen Milinković, Pierre-Luc Bacon, Jian Tang, Mladen Nikolić

Value Iteration Networks (VINs) have emerged as a popular method to incorporate planning algorithms within deep reinforcement learning, enabling performance improvements on tasks requiring long-range reasoning and understanding of environment dynamics.

Graph Representation Learning reinforcement-learning +1

Graph neural induction of value iteration

no code implementations26 Sep 2020 Andreea Deac, Pierre-Luc Bacon, Jian Tang

Previously, such planning components have been incorporated through a neural network that partially aligns with the computational graph of value iteration.


TDprop: Does Jacobi Preconditioning Help Temporal Difference Learning?

no code implementations6 Jul 2020 Joshua Romoff, Peter Henderson, David Kanaa, Emmanuel Bengio, Ahmed Touati, Pierre-Luc Bacon, Joelle Pineau

We investigate whether Jacobi preconditioning, accounting for the bootstrap term in temporal difference (TD) learning, can help boost performance of adaptive optimizers.

Policy Evaluation Networks

no code implementations26 Feb 2020 Jean Harb, Tom Schaul, Doina Precup, Pierre-Luc Bacon

The core idea of this paper is to flip this convention and estimate the value of many policies, for a single set of states.


Options of Interest: Temporal Abstraction with Interest Functions

3 code implementations1 Jan 2020 Khimya Khetarpal, Martin Klissarov, Maxime Chevalier-Boisvert, Pierre-Luc Bacon, Doina Precup

Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time.

Entropy Regularization with Discounted Future State Distribution in Policy Gradient Methods

no code implementations11 Dec 2019 Riashat Islam, Raihan Seraj, Pierre-Luc Bacon, Doina Precup

In this work, we propose exploration in policy gradient methods based on maximizing entropy of the discounted future state distribution.

Policy Gradient Methods

All-Action Policy Gradient Methods: A Numerical Integration Approach

no code implementations21 Oct 2019 Benjamin Petit, Loren Amdahl-Culleton, Yao Liu, Jimmy Smith, Pierre-Luc Bacon

While often stated as an instance of the likelihood ratio trick [Rubinstein, 1989], the original policy gradient theorem [Sutton, 1999] involves an integral over the action space.

Continuous Control Numerical Integration +1

Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling

no code implementations ICML 2020 Yao Liu, Pierre-Luc Bacon, Emma Brunskill

Surprisingly, we find that in finite horizon MDPs there is no strict variance reduction of per-decision importance sampling or stationary importance sampling, comparing with vanilla importance sampling.

The Barbados 2018 List of Open Issues in Continual Learning

no code implementations16 Nov 2018 Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc Bellemare, Doina Precup

We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments.

Continual Learning

Learning Robust Options

no code implementations9 Feb 2018 Daniel J. Mankowitz, Timothy A. Mann, Pierre-Luc Bacon, Doina Precup, Shie Mannor

We present a Robust Options Policy Iteration (ROPI) algorithm with convergence guarantees, which learns options that are robust to model uncertainty.

Learnings Options End-to-End for Continuous Action Tasks

2 code implementations30 Nov 2017 Martin Klissarov, Pierre-Luc Bacon, Jean Harb, Doina Precup

We present new results on learning temporally extended actions for continuoustasks, using the options framework (Suttonet al.[1999b], Precup [2000]).

Learning with Options that Terminate Off-Policy

no code implementations10 Nov 2017 Anna Harutyunyan, Peter Vrancx, Pierre-Luc Bacon, Doina Precup, Ann Nowe

Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient.

When Waiting is not an Option : Learning Options with a Deliberation Cost

1 code implementation14 Sep 2017 Jean Harb, Pierre-Luc Bacon, Martin Klissarov, Doina Precup

Recent work has shown that temporally extended actions (options) can be learned fully end-to-end as opposed to being specified in advance.

Atari Games

Convergent Tree Backup and Retrace with Function Approximation

no code implementations ICML 2018 Ahmed Touati, Pierre-Luc Bacon, Doina Precup, Pascal Vincent

Off-policy learning is key to scaling up reinforcement learning as it allows to learn about a target policy from the experience generated by a different behavior policy.

A Matrix Splitting Perspective on Planning with Options

no code implementations3 Dec 2016 Pierre-Luc Bacon, Doina Precup

We show that the Bellman operator underlying the options framework leads to a matrix splitting, an approach traditionally used to speed up convergence of iterative solvers for large linear systems of equations.

The Option-Critic Architecture

9 code implementations16 Sep 2016 Pierre-Luc Bacon, Jean Harb, Doina Precup

Temporal abstraction is key to scaling up learning and planning in reinforcement learning.


Conditional Computation in Neural Networks for faster models

1 code implementation19 Nov 2015 Emmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, Doina Precup

In this paper, we use reinforcement learning as a tool to optimize conditional computation policies.


Cannot find the paper you are looking for? You can Submit a new open access paper.