Search Results for author: Romain Laroche

Finally, we implement a model-free version of SPIBB and show its benefits on a navigation task with deep RL implementation called SPIBB-DQN, which is, to the best of our knowledge, the first RL algorithm relying on a neural network representation able to train efficiently and reliably from batch data, without any interaction with the environment.

Paper
Code

Counting to Explore and Generalize in Text-based Games

2 code implementations • 29 Jun 2018 • Xingdi Yuan, Marc-Alexandre Côté, Alessandro Sordoni, Romain Laroche, Remi Tachet des Combes, Matthew Hausknecht, Adam Trischler

We propose a recurrent RL agent with an episodic exploration mechanism that helps discovering good policies in text-based game environments.

text-based games

Paper
Code

Decentralized Exploration in Multi-Armed Bandits -- Extended version

no code implementations • 19 Nov 2018 • Raphaël Féraud, Réda Alami, Romain Laroche

We consider the decentralized exploration problem: a set of players collaborate to identify the best arm by asynchronously interacting with the same stochastic environment.

Multi-Armed Bandits

Paper
Add Code

Budgeted Reinforcement Learning in Continuous State Space

1 code implementation • NeurIPS 2019 • Nicolas Carrara, Edouard Leurent, Romain Laroche, Tanguy Urvoy, Odalric-Ambrym Maillard, Olivier Pietquin

A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints.

Autonomous Driving reinforcement-learning +1

536

Paper
Code

Safe Policy Improvement with Soft Baseline Bootstrapping

2 code implementations • 11 Jul 2019 • Kimia Nadjahi, Romain Laroche, Rémi Tachet des Combes

Batch Reinforcement Learning (Batch RL) consists in training a policy using trajectories collected with another policy, called the behavioural policy.

Paper
Code

Safe Policy Improvement with an Estimated Baseline Policy

no code implementations • 11 Sep 2019 • Thiago D. Simão, Romain Laroche, Rémi Tachet des Combes

Previous work has shown the unreliability of existing algorithms in the batch Reinforcement Learning setting, and proposed the theoretically-grounded Safe Policy Improvement with Baseline Bootstrapping (SPIBB) fix: reproduce the baseline policy in the uncertain state-action pairs, in order to control the variance on the trained policy performance.

Management

Paper
Add Code

Building Dynamic Knowledge Graphs from Text-based Games

1 code implementation • 21 Oct 2019 • Mikuláš Zelinka, Xingdi Yuan, Marc-Alexandre Côté, Romain Laroche, Adam Trischler

We are interested in learning how to update Knowledge Graphs (KG) from text.

Knowledge Graphs text-based games

Paper
Code

Learning Dynamic Belief Graphs to Generalize on Text-Based Games

1 code implementation • NeurIPS 2020 • Ashutosh Adhikari, Xingdi Yuan, Marc-Alexandre Côté, Mikuláš Zelinka, Marc-Antoine Rondeau, Romain Laroche, Pascal Poupart, Jian Tang, Adam Trischler, William L. Hamilton

Playing text-based games requires skills in processing natural language and sequential decision making.

Decision Making Knowledge Graphs +2

Paper
Code

Reinforcement Learning Framework for Deep Brain Stimulation Study

1 code implementation • 22 Feb 2020 • Dmitrii Krylov, Remi Tachet, Romain Laroche, Michael Rosenblum, Dmitry V. Dylov

Malfunctioning neurons in the brain sometimes operate synchronously, reportedly causing many neurological diseases, e. g. Parkinson's.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms

1 code implementation • 2 Oct 2020 • Shangtong Zhang, Romain Laroche, Harm van Seijen, Shimon Whiteson, Remi Tachet des Combes

In the second scenario, we consider optimizing a discounted objective ($\gamma < 1$) and propose to interpret the omission of the discounting in the actor update from an auxiliary task perspective and provide supporting empirical results.

Representation Learning

3,105

Paper
Code

Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs

no code implementations • NeurIPS 2021 • Harsh Satija, Philip S. Thomas, Joelle Pineau, Romain Laroche

We study the problem of Safe Policy Improvement (SPI) under constraints in the offline Reinforcement Learning (RL) setting.

Reinforcement Learning (RL)

Paper
Add Code

The Emergence of the Shape Bias Results from Communicative Efficiency

1 code implementation • CoNLL (EMNLP) 2021 • Eva Portelance, Michael C. Frank, Dan Jurafsky, Alessandro Sordoni, Romain Laroche

By the age of two, children tend to assume that new word categories are based on objects' shape, rather than their color or texture; this assumption is called the shape bias.

Paper
Code

Dr Jekyll and Mr Hyde: the Strange Case of Off-Policy Policy Updates

1 code implementation • 29 Sep 2021 • Romain Laroche, Remi Tachet

To implement the principles prescribed by our theory, we propose an agent, Dr Jekyll & Mr Hyde (JH), with a double personality: Dr Jekyll purely exploits while Mr Hyde purely explores.

Paper
Code

Batched Bandits with Crowd Externalities

no code implementations • 29 Sep 2021 • Romain Laroche, Othmane Safsafi, Raphael Feraud, Nicolas Broutin

In Batched Multi-Armed Bandits (BMAB), the policy is not allowed to be updated at each time step.

Multi-Armed Bandits

Paper
Add Code

Learnability and Expressiveness in Self-Supervised Learning

no code implementations • 29 Sep 2021 • Yuchen Lu, Zhen Liu, Alessandro Sordoni, Aristide Baratin, Romain Laroche, Aaron Courville

In this work, we argue that representations induced by self-supervised learning (SSL) methods should both be expressive and learnable.

Data Augmentation Self-Supervised Learning

Paper
Add Code

Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch

1 code implementation • NeurIPS 2023 • Shangtong Zhang, Remi Tachet, Romain Laroche

In this paper, we establish the global optimality and convergence rate of an off-policy actor critic algorithm in the tabular setting without using density ratio to correct the discrepancy between the state distribution of the behavior policy and that of the target policy.

Policy Gradient Methods

3,105

Paper
Code

Dr Jekyll & Mr Hyde: the strange case of off-policy policy updates

no code implementations • NeurIPS 2021 • Romain Laroche, Remi Tachet des Combes

To implement the principles prescribed by our theory, we propose an agent, Dr Jekyll & Mr Hyde (J&H), with a double personality: Dr Jekyll purely exploits while Mr Hyde purely explores.

Paper
Add Code

On the Convergence of SARSA with Linear Function Approximation

no code implementations • 14 Feb 2022 • Shangtong Zhang, Remi Tachet, Romain Laroche

SARSA, a classical on-policy control algorithm for reinforcement learning, is known to chatter when combined with linear function approximation: SARSA does not diverge but oscillates in a bounded region.

Paper
Add Code

Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms

no code implementations • 15 Feb 2022 • Romain Laroche, Remi Tachet

To increase the unlearning speed, we study a novel policy update: the gradient of the cross-entropy loss with respect to the action maximizing $q$, but find that such updates may lead to a decrease in value.

Paper
Add Code

One-Shot Learning from a Demonstration with Hierarchical Latent Language

no code implementations • 9 Mar 2022 • Nathaniel Weir, Xingdi Yuan, Marc-Alexandre Côté, Matthew Hausknecht, Romain Laroche, Ida Momennejad, Harm van Seijen, Benjamin Van Durme

Humans have the capability, aided by the expressive compositionality of their language, to learn quickly by demonstration.

One-Shot Learning

Paper
Add Code

Non-Markovian policies occupancy measures

no code implementations • 27 May 2022 • Romain Laroche, Remi Tachet des Combes, Jacob Buckman

A central object of study in Reinforcement Learning (RL) is the Markovian policy, in which an agent's actions are chosen from a memoryless probability distribution, conditioned only on its current state.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

When does return-conditioned supervised learning work for offline reinforcement learning?

1 code implementation • 2 Jun 2022 • David Brandfonbrener, Alberto Bietti, Jacob Buckman, Romain Laroche, Joan Bruna

Several recent works have proposed a class of algorithms for the offline reinforcement learning (RL) problem that we will refer to as return-conditioned supervised learning (RCSL).

D4RL reinforcement-learning +1

Paper
Code

Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods

no code implementations • 2 Jun 2022 • Yuchen Lu, Zhen Liu, Aristide Baratin, Romain Laroche, Aaron Courville, Alessandro Sordoni

We address the problem of evaluating the quality of self-supervised learning (SSL) models without access to supervised labels, while being agnostic to the architecture, learning algorithm or data manipulation used during training.

Domain Generalization Self-Supervised Learning

Paper
Add Code

Incorporating Explicit Uncertainty Estimates into Deep Offline Reinforcement Learning

no code implementations • 2 Jun 2022 • David Brandfonbrener, Remi Tachet des Combes, Romain Laroche

In this work, we develop a novel method for incorporating scalable uncertainty estimates into an offline reinforcement learning algorithm called deep-SPIBB that extends the SPIBB family of algorithms to environments with larger state and action spaces.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Contrastive Multimodal Learning for Emergence of Graphical Sensory-Motor Communication

no code implementations • 3 Oct 2022 • Tristan Karch, Yoann Lemesle, Romain Laroche, Clément Moulin-Frier, Pierre-Yves Oudeyer

In this paper, we investigate whether artificial agents can develop a shared language in an ecological setting where communication relies on a sensory-motor channel.

Paper
Add Code

Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning

no code implementations • 1 Nov 2022 • Riashat Islam, Hongyu Zang, Anirudh Goyal, Alex Lamb, Kenji Kawaguchi, Xin Li, Romain Laroche, Yoshua Bengio, Remi Tachet des Combes

Goal-conditioned reinforcement learning (RL) is a promising direction for training agents that are capable of solving multiple tasks and reach a diverse set of objectives.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Behavior Prior Representation learning for Offline Reinforcement Learning

1 code implementation • 2 Nov 2022 • Hongyu Zang, Xin Li, Jie Yu, Chen Liu, Riashat Islam, Remi Tachet des Combes, Romain Laroche

Our method, Behavior Prior Representation (BPR), learns state representations with an easy-to-integrate objective based on behavior cloning of the dataset: we first learn a state representation by mimicking actions from the dataset, and then train a policy on top of the fixed representation, using any off-the-shelf Offline RL algorithm.

Offline RL reinforcement-learning +2

Paper
Code

Think Before You Act: Decision Transformers with Internal Working Memory

1 code implementation • 24 May 2023 • Jikun Kang, Romain Laroche, Xindi Yuan, Adam Trischler, Xue Liu, Jie Fu

We argue that this inefficiency stems from the forgetting phenomenon, in which a model memorizes its behaviors in parameters throughout training.

Atari Games Decision Making +2

Paper
Code

Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting

1 code implementation • 22 Jun 2023 • Zhang-Wei Hong, Pulkit Agrawal, Rémi Tachet des Combes, Romain Laroche

This re-weighted sampling strategy may be combined with any offline RL algorithm.

Offline RL reinforcement-learning +1

Paper
Code

Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning

1 code implementation • 30 Sep 2023 • Mingde Zhao, Safa Alver, Harm van Seijen, Romain Laroche, Doina Precup, Yoshua Bengio

Inspired by human conscious planning, we propose Skipper, a model-based reinforcement learning framework utilizing spatio-temporal abstractions to generalize better in novel situations.

Decision Making Model-based Reinforcement Learning +2

Paper
Code

Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

1 code implementation • NeurIPS 2023 • Zhang-Wei Hong, Aviral Kumar, Sathwik Karnik, Abhishek Bhandwaldar, Akash Srivastava, Joni Pajarinen, Romain Laroche, Abhishek Gupta, Pulkit Agrawal

We argue this is due to an assumption made by current offline RL algorithms of staying close to the trajectories in the dataset.

D4RL Decision Making +3

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.