You need to log in to edit.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

1 code implementation • 16 May 2022 • Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville

This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later.

Ranked #2 on Atari Games 100k on Atari 100k

no code implementations • ICLR 2022 • Tristan Deleu, David Kanaa, Leo Feng, Giancarlo Kerg, Yoshua Bengio, Guillaume Lajoie, Pierre-Luc Bacon

Drawing inspiration from gradient-based meta-learning methods with infinitely small gradient steps, we introduce Continuous-Time Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field.

1 code implementation • 22 Feb 2022 • Nikolaus H. R. Howe, Simon Dufort-Labbé, Nitarshan Rajkumar, Pierre-Luc Bacon

We present Myriad, a testbed written in JAX for learning and planning in real-world continuous environments.

no code implementations • 22 Dec 2021 • Julien Roy, Roger Girgis, Joshua Romoff, Pierre-Luc Bacon, Christopher Pal

The standard formulation of Reinforcement Learning lacks a practical way of specifying what are admissible and forbidden behaviors.

no code implementations • NeurIPS 2021 • Andreea Deac, Petar Veličković, Ognjen Milinković, Pierre-Luc Bacon, Jian Tang, Mladen Nikolić

We find that prior approaches either assume that the environment is provided in such a tabular form -- which is highly restrictive -- or infer "local neighbourhoods" of states to run value iteration over -- for which we discover an algorithmic bottleneck effect.

no code implementations • 29 Sep 2021 • Manuel Del Verme, Pierre-Luc Bacon

We develop a multiple shooting method for learning in deep neural networks based on the Lagrangian perspective on automatic differentiation.

1 code implementation • ICML Workshop URL 2021 • Evgenii Nikishin, Romina Abachi, Rishabh Agarwal, Pierre-Luc Bacon

The shortcomings of maximum likelihood estimation in the context of model-based reinforcement learning have been highlighted by an increasing number of papers.

no code implementations • 10 Mar 2021 • Dilip Arumugam, Peter Henderson, Pierre-Luc Bacon

How do we formalize the challenge of credit assignment in reinforcement learning?

no code implementations • 25 Oct 2020 • Andreea Deac, Petar Veličković, Ognjen Milinković, Pierre-Luc Bacon, Jian Tang, Mladen Nikolić

Value Iteration Networks (VINs) have emerged as a popular method to incorporate planning algorithms within deep reinforcement learning, enabling performance improvements on tasks requiring long-range reasoning and understanding of environment dynamics.

no code implementations • 26 Sep 2020 • Andreea Deac, Pierre-Luc Bacon, Jian Tang

Previously, such planning components have been incorporated through a neural network that partially aligns with the computational graph of value iteration.

no code implementations • 6 Jul 2020 • Joshua Romoff, Peter Henderson, David Kanaa, Emmanuel Bengio, Ahmed Touati, Pierre-Luc Bacon, Joelle Pineau

We investigate whether Jacobi preconditioning, accounting for the bootstrap term in temporal difference (TD) learning, can help boost performance of adaptive optimizers.

no code implementations • 26 Feb 2020 • Jean Harb, Tom Schaul, Doina Precup, Pierre-Luc Bacon

The core idea of this paper is to flip this convention and estimate the value of many policies, for a single set of states.

3 code implementations • 1 Jan 2020 • Khimya Khetarpal, Martin Klissarov, Maxime Chevalier-Boisvert, Pierre-Luc Bacon, Doina Precup

Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time.

no code implementations • 11 Dec 2019 • Riashat Islam, Raihan Seraj, Pierre-Luc Bacon, Doina Precup

In this work, we propose exploration in policy gradient methods based on maximizing entropy of the discounted future state distribution.

no code implementations • 21 Oct 2019 • Benjamin Petit, Loren Amdahl-Culleton, Yao Liu, Jimmy Smith, Pierre-Luc Bacon

While often stated as an instance of the likelihood ratio trick [Rubinstein, 1989], the original policy gradient theorem [Sutton, 1999] involves an integral over the action space.

no code implementations • ICML 2020 • Yao Liu, Pierre-Luc Bacon, Emma Brunskill

Surprisingly, we find that in finite horizon MDPs there is no strict variance reduction of per-decision importance sampling or stationary importance sampling, comparing with vanilla importance sampling.

no code implementations • 16 Nov 2018 • Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc Bellemare, Doina Precup

We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments.

no code implementations • 9 Feb 2018 • Daniel J. Mankowitz, Timothy A. Mann, Pierre-Luc Bacon, Doina Precup, Shie Mannor

We present a Robust Options Policy Iteration (ROPI) algorithm with convergence guarantees, which learns options that are robust to model uncertainty.

2 code implementations • 30 Nov 2017 • Martin Klissarov, Pierre-Luc Bacon, Jean Harb, Doina Precup

We present new results on learning temporally extended actions for continuoustasks, using the options framework (Suttonet al.[1999b], Precup [2000]).

no code implementations • 10 Nov 2017 • Anna Harutyunyan, Peter Vrancx, Pierre-Luc Bacon, Doina Precup, Ann Nowe

Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient.

1 code implementation • 20 Sep 2017 • Peter Henderson, Wei-Di Chang, Pierre-Luc Bacon, David Meger, Joelle Pineau, Doina Precup

Inverse reinforcement learning offers a useful paradigm to learn the underlying reward function directly from expert demonstrations.

1 code implementation • 14 Sep 2017 • Jean Harb, Pierre-Luc Bacon, Martin Klissarov, Doina Precup

Recent work has shown that temporally extended actions (options) can be learned fully end-to-end as opposed to being specified in advance.

no code implementations • ICML 2018 • Ahmed Touati, Pierre-Luc Bacon, Doina Precup, Pascal Vincent

Off-policy learning is key to scaling up reinforcement learning as it allows to learn about a target policy from the experience generated by a different behavior policy.

no code implementations • 3 Dec 2016 • Pierre-Luc Bacon, Doina Precup

We show that the Bellman operator underlying the options framework leads to a matrix splitting, an approach traditionally used to speed up convergence of iterative solvers for large linear systems of equations.

9 code implementations • 16 Sep 2016 • Pierre-Luc Bacon, Jean Harb, Doina Precup

Temporal abstraction is key to scaling up learning and planning in reinforcement learning.

1 code implementation • 19 Nov 2015 • Emmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, Doina Precup

In this paper, we use reinforcement learning as a tool to optimize conditional computation policies.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.