Search Results for author: Romain Laroche

Found 47 papers, 18 papers with code

Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning

1 code implementation30 Sep 2023 Mingde Zhao, Safa Alver, Harm van Seijen, Romain Laroche, Doina Precup, Yoshua Bengio

Inspired by human conscious planning, we propose Skipper, a model-based reinforcement learning framework utilizing spatio-temporal abstractions to generalize better in novel situations.

Decision Making Model-based Reinforcement Learning +2

Think Before You Act: Decision Transformers with Internal Working Memory

1 code implementation24 May 2023 Jikun Kang, Romain Laroche, Xindi Yuan, Adam Trischler, Xue Liu, Jie Fu

We argue that this inefficiency stems from the forgetting phenomenon, in which a model memorizes its behaviors in parameters throughout training.

Atari Games Decision Making +2

Behavior Prior Representation learning for Offline Reinforcement Learning

1 code implementation2 Nov 2022 Hongyu Zang, Xin Li, Jie Yu, Chen Liu, Riashat Islam, Remi Tachet des Combes, Romain Laroche

Our method, Behavior Prior Representation (BPR), learns state representations with an easy-to-integrate objective based on behavior cloning of the dataset: we first learn a state representation by mimicking actions from the dataset, and then train a policy on top of the fixed representation, using any off-the-shelf Offline RL algorithm.

Offline RL reinforcement-learning +2

Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning

no code implementations1 Nov 2022 Riashat Islam, Hongyu Zang, Anirudh Goyal, Alex Lamb, Kenji Kawaguchi, Xin Li, Romain Laroche, Yoshua Bengio, Remi Tachet des Combes

Goal-conditioned reinforcement learning (RL) is a promising direction for training agents that are capable of solving multiple tasks and reach a diverse set of objectives.

reinforcement-learning Reinforcement Learning (RL)

Contrastive Multimodal Learning for Emergence of Graphical Sensory-Motor Communication

no code implementations3 Oct 2022 Tristan Karch, Yoann Lemesle, Romain Laroche, Clément Moulin-Frier, Pierre-Yves Oudeyer

In this paper, we investigate whether artificial agents can develop a shared language in an ecological setting where communication relies on a sensory-motor channel.

Incorporating Explicit Uncertainty Estimates into Deep Offline Reinforcement Learning

no code implementations2 Jun 2022 David Brandfonbrener, Remi Tachet des Combes, Romain Laroche

In this work, we develop a novel method for incorporating scalable uncertainty estimates into an offline reinforcement learning algorithm called deep-SPIBB that extends the SPIBB family of algorithms to environments with larger state and action spaces.

reinforcement-learning Reinforcement Learning (RL)

Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods

no code implementations2 Jun 2022 Yuchen Lu, Zhen Liu, Aristide Baratin, Romain Laroche, Aaron Courville, Alessandro Sordoni

We address the problem of evaluating the quality of self-supervised learning (SSL) models without access to supervised labels, while being agnostic to the architecture, learning algorithm or data manipulation used during training.

Domain Generalization Self-Supervised Learning

When does return-conditioned supervised learning work for offline reinforcement learning?

1 code implementation2 Jun 2022 David Brandfonbrener, Alberto Bietti, Jacob Buckman, Romain Laroche, Joan Bruna

Several recent works have proposed a class of algorithms for the offline reinforcement learning (RL) problem that we will refer to as return-conditioned supervised learning (RCSL).

D4RL reinforcement-learning +1

Non-Markovian policies occupancy measures

no code implementations27 May 2022 Romain Laroche, Remi Tachet des Combes, Jacob Buckman

A central object of study in Reinforcement Learning (RL) is the Markovian policy, in which an agent's actions are chosen from a memoryless probability distribution, conditioned only on its current state.

reinforcement-learning Reinforcement Learning (RL)

Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms

no code implementations15 Feb 2022 Romain Laroche, Remi Tachet

To increase the unlearning speed, we study a novel policy update: the gradient of the cross-entropy loss with respect to the action maximizing $q$, but find that such updates may lead to a decrease in value.

On the Convergence of SARSA with Linear Function Approximation

no code implementations14 Feb 2022 Shangtong Zhang, Remi Tachet, Romain Laroche

SARSA, a classical on-policy control algorithm for reinforcement learning, is known to chatter when combined with linear function approximation: SARSA does not diverge but oscillates in a bounded region.

Dr Jekyll & Mr Hyde: the strange case of off-policy policy updates

no code implementations NeurIPS 2021 Romain Laroche, Remi Tachet des Combes

To implement the principles prescribed by our theory, we propose an agent, Dr Jekyll & Mr Hyde (J&H), with a double personality: Dr Jekyll purely exploits while Mr Hyde purely explores.

Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch

1 code implementation NeurIPS 2023 Shangtong Zhang, Remi Tachet, Romain Laroche

In this paper, we establish the global optimality and convergence rate of an off-policy actor critic algorithm in the tabular setting without using density ratio to correct the discrepancy between the state distribution of the behavior policy and that of the target policy.

Policy Gradient Methods

Dr Jekyll and Mr Hyde: the Strange Case of Off-Policy Policy Updates

1 code implementation29 Sep 2021 Romain Laroche, Remi Tachet

To implement the principles prescribed by our theory, we propose an agent, Dr Jekyll & Mr Hyde (JH), with a double personality: Dr Jekyll purely exploits while Mr Hyde purely explores.

Batched Bandits with Crowd Externalities

no code implementations29 Sep 2021 Romain Laroche, Othmane Safsafi, Raphael Feraud, Nicolas Broutin

In Batched Multi-Armed Bandits (BMAB), the policy is not allowed to be updated at each time step.

Multi-Armed Bandits

Learnability and Expressiveness in Self-Supervised Learning

no code implementations29 Sep 2021 Yuchen Lu, Zhen Liu, Alessandro Sordoni, Aristide Baratin, Romain Laroche, Aaron Courville

In this work, we argue that representations induced by self-supervised learning (SSL) methods should both be expressive and learnable.

Data Augmentation Self-Supervised Learning

The Emergence of the Shape Bias Results from Communicative Efficiency

1 code implementation CoNLL (EMNLP) 2021 Eva Portelance, Michael C. Frank, Dan Jurafsky, Alessandro Sordoni, Romain Laroche

By the age of two, children tend to assume that new word categories are based on objects' shape, rather than their color or texture; this assumption is called the shape bias.

A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms

1 code implementation2 Oct 2020 Shangtong Zhang, Romain Laroche, Harm van Seijen, Shimon Whiteson, Remi Tachet des Combes

In the second scenario, we consider optimizing a discounted objective ($\gamma < 1$) and propose to interpret the omission of the discounting in the actor update from an auxiliary task perspective and provide supporting empirical results.

Representation Learning

Reinforcement Learning Framework for Deep Brain Stimulation Study

1 code implementation22 Feb 2020 Dmitrii Krylov, Remi Tachet, Romain Laroche, Michael Rosenblum, Dmitry V. Dylov

Malfunctioning neurons in the brain sometimes operate synchronously, reportedly causing many neurological diseases, e. g. Parkinson's.

reinforcement-learning Reinforcement Learning (RL)

Safe Policy Improvement with an Estimated Baseline Policy

no code implementations11 Sep 2019 Thiago D. Simão, Romain Laroche, Rémi Tachet des Combes

Previous work has shown the unreliability of existing algorithms in the batch Reinforcement Learning setting, and proposed the theoretically-grounded Safe Policy Improvement with Baseline Bootstrapping (SPIBB) fix: reproduce the baseline policy in the uncertain state-action pairs, in order to control the variance on the trained policy performance.


Safe Policy Improvement with Soft Baseline Bootstrapping

2 code implementations11 Jul 2019 Kimia Nadjahi, Romain Laroche, Rémi Tachet des Combes

Batch Reinforcement Learning (Batch RL) consists in training a policy using trajectories collected with another policy, called the behavioural policy.

Decentralized Exploration in Multi-Armed Bandits -- Extended version

no code implementations19 Nov 2018 Raphaël Féraud, Réda Alami, Romain Laroche

We consider the decentralized exploration problem: a set of players collaborate to identify the best arm by asynchronously interacting with the same stochastic environment.

Multi-Armed Bandits

Counting to Explore and Generalize in Text-based Games

2 code implementations29 Jun 2018 Xingdi Yuan, Marc-Alexandre Côté, Alessandro Sordoni, Romain Laroche, Remi Tachet des Combes, Matthew Hausknecht, Adam Trischler

We propose a recurrent RL agent with an episodic exploration mechanism that helps discovering good policies in text-based game environments.

text-based games

Safe Policy Improvement with Baseline Bootstrapping

2 code implementations19 Dec 2017 Romain Laroche, Paul Trichelair, Rémi Tachet des Combes

Finally, we implement a model-free version of SPIBB and show its benefits on a navigation task with deep RL implementation called SPIBB-DQN, which is, to the best of our knowledge, the first RL algorithm relying on a neural network representation able to train efficiently and reliably from batch data, without any interaction with the environment.

The Complex Negotiation Dialogue Game

no code implementations5 Jul 2017 Romain Laroche

This position paper formalises an abstract model for complex negotiation dialogue.

One-Shot Learning Position +3

Reinforcement Learning Algorithm Selection

no code implementations ICLR 2018 Romain Laroche, Raphael Feraud

This paper formalises the problem of online algorithm selection in the context of Reinforcement Learning.

reinforcement-learning Reinforcement Learning (RL)

Separation of Concerns in Reinforcement Learning

no code implementations15 Dec 2016 Harm van Seijen, Mehdi Fatemi, Joshua Romoff, Romain Laroche

In this paper, we propose a framework for solving a single-agent task by using multiple agents, each focusing on different aspects of the task.

reinforcement-learning Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.