Search Results for author: Romain Laroche

Found 43 papers, 13 papers with code

Behavior Prior Representation learning for Offline Reinforcement Learning

no code implementations2 Nov 2022 Hongyu Zang, Xin Li, Jie Yu, Chen Liu, Riashat Islam, Remi Tachet des Combes, Romain Laroche

Offline reinforcement learning (RL) struggles in environments with rich and noisy inputs, where the agent only has access to a fixed dataset without environment interactions.

Offline RL reinforcement-learning +2

Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning

no code implementations1 Nov 2022 Riashat Islam, Hongyu Zang, Anirudh Goyal, Alex Lamb, Kenji Kawaguchi, Xin Li, Romain Laroche, Yoshua Bengio, Remi Tachet des Combes

Goal-conditioned reinforcement learning (RL) is a promising direction for training agents that are capable of solving multiple tasks and reach a diverse set of objectives.

reinforcement-learning reinforcement Learning

When does return-conditioned supervised learning work for offline reinforcement learning?

1 code implementation2 Jun 2022 David Brandfonbrener, Alberto Bietti, Jacob Buckman, Romain Laroche, Joan Bruna

Several recent works have proposed a class of algorithms for the offline reinforcement learning (RL) problem that we will refer to as return-conditioned supervised learning (RCSL).

D4RL reinforcement-learning +1

Incorporating Explicit Uncertainty Estimates into Deep Offline Reinforcement Learning

no code implementations2 Jun 2022 David Brandfonbrener, Remi Tachet des Combes, Romain Laroche

In this work, we develop a novel method for incorporating scalable uncertainty estimates into an offline reinforcement learning algorithm called deep-SPIBB that extends the SPIBB family of algorithms to environments with larger state and action spaces.

reinforcement-learning reinforcement Learning

Non-Markovian policies occupancy measures

no code implementations27 May 2022 Romain Laroche, Remi Tachet des Combes, Jacob Buckman

A central object of study in Reinforcement Learning (RL) is the Markovian policy, in which an agent's actions are chosen from a memoryless probability distribution, conditioned only on its current state.

reinforcement-learning reinforcement Learning

Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms

no code implementations15 Feb 2022 Romain Laroche, Remi Tachet

To increase the unlearning speed, we study a novel policy update: the gradient of the cross-entropy loss with respect to the action maximizing $q$, but find that such updates may lead to a decrease in value.

On the Chattering of SARSA with Linear Function Approximation

no code implementations14 Feb 2022 Shangtong Zhang, Remi Tachet, Romain Laroche

SARSA, a classical on-policy control algorithm for reinforcement learning, is known to chatter when combined with linear function approximation: SARSA does not diverge but oscillates in a bounded region.

Dr Jekyll & Mr Hyde: the strange case of off-policy policy updates

no code implementations NeurIPS 2021 Romain Laroche, Remi Tachet des Combes

To implement the principles prescribed by our theory, we propose an agent, Dr Jekyll & Mr Hyde (J&H), with a double personality: Dr Jekyll purely exploits while Mr Hyde purely explores.

Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch

1 code implementation4 Nov 2021 Shangtong Zhang, Remi Tachet, Romain Laroche

In this paper, we establish the global optimality and convergence rate of an off-policy actor critic algorithm in the tabular setting without using density ratio to correct the discrepancy between the state distribution of the behavior policy and that of the target policy.

Policy Gradient Methods

Batched Bandits with Crowd Externalities

no code implementations29 Sep 2021 Romain Laroche, Othmane Safsafi, Raphael Feraud, Nicolas Broutin

In Batched Multi-Armed Bandits (BMAB), the policy is not allowed to be updated at each time step.

Multi-Armed Bandits

Learnability and Expressiveness in Self-Supervised Learning

no code implementations29 Sep 2021 Yuchen Lu, Zhen Liu, Alessandro Sordoni, Aristide Baratin, Romain Laroche, Aaron Courville

In this work, we argue that representations induced by self-supervised learning (SSL) methods should both be expressive and learnable.

Data Augmentation Self-Supervised Learning

Dr Jekyll and Mr Hyde: the Strange Case of Off-Policy Policy Updates

1 code implementation29 Sep 2021 Romain Laroche, Remi Tachet

To implement the principles prescribed by our theory, we propose an agent, Dr Jekyll & Mr Hyde (JH), with a double personality: Dr Jekyll purely exploits while Mr Hyde purely explores.

The Emergence of the Shape Bias Results from Communicative Efficiency

1 code implementation CoNLL (EMNLP) 2021 Eva Portelance, Michael C. Frank, Dan Jurafsky, Alessandro Sordoni, Romain Laroche

By the age of two, children tend to assume that new word categories are based on objects' shape, rather than their color or texture; this assumption is called the shape bias.

Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs

no code implementations NeurIPS 2021 Harsh Satija, Philip S. Thomas, Joelle Pineau, Romain Laroche

We study the problem of Safe Policy Improvement (SPI) under constraints in the offline Reinforcement Learning (RL) setting.

A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms

1 code implementation2 Oct 2020 Shangtong Zhang, Romain Laroche, Harm van Seijen, Shimon Whiteson, Remi Tachet des Combes

In the second scenario, we consider optimizing a discounted objective ($\gamma < 1$) and propose to interpret the omission of the discounting in the actor update from an auxiliary task perspective and provide supporting empirical results.

Representation Learning

Reinforcement Learning Framework for Deep Brain Stimulation Study

1 code implementation22 Feb 2020 Dmitrii Krylov, Remi Tachet, Romain Laroche, Michael Rosenblum, Dmitry V. Dylov

Malfunctioning neurons in the brain sometimes operate synchronously, reportedly causing many neurological diseases, e. g. Parkinson's.

reinforcement-learning reinforcement Learning

Safe Policy Improvement with an Estimated Baseline Policy

no code implementations11 Sep 2019 Thiago D. Simão, Romain Laroche, Rémi Tachet des Combes

Previous work has shown the unreliability of existing algorithms in the batch Reinforcement Learning setting, and proposed the theoretically-grounded Safe Policy Improvement with Baseline Bootstrapping (SPIBB) fix: reproduce the baseline policy in the uncertain state-action pairs, in order to control the variance on the trained policy performance.


Safe Policy Improvement with Soft Baseline Bootstrapping

2 code implementations11 Jul 2019 Kimia Nadjahi, Romain Laroche, Rémi Tachet des Combes

Batch Reinforcement Learning (Batch RL) consists in training a policy using trajectories collected with another policy, called the behavioural policy.

Decentralized Exploration in Multi-Armed Bandits -- Extended version

no code implementations19 Nov 2018 Raphaël Féraud, Réda Alami, Romain Laroche

We consider the decentralized exploration problem: a set of players collaborate to identify the best arm by asynchronously interacting with the same stochastic environment.

Multi-Armed Bandits

Counting to Explore and Generalize in Text-based Games

2 code implementations29 Jun 2018 Xingdi Yuan, Marc-Alexandre Côté, Alessandro Sordoni, Romain Laroche, Remi Tachet des Combes, Matthew Hausknecht, Adam Trischler

We propose a recurrent RL agent with an episodic exploration mechanism that helps discovering good policies in text-based game environments.

text-based games

Safe Policy Improvement with Baseline Bootstrapping

2 code implementations19 Dec 2017 Romain Laroche, Paul Trichelair, Rémi Tachet des Combes

Finally, we implement a model-free version of SPIBB and show its benefits on a navigation task with deep RL implementation called SPIBB-DQN, which is, to the best of our knowledge, the first RL algorithm relying on a neural network representation able to train efficiently and reliably from batch data, without any interaction with the environment.

The Complex Negotiation Dialogue Game

no code implementations5 Jul 2017 Romain Laroche

This position paper formalises an abstract model for complex negotiation dialogue.

One-Shot Learning reinforcement-learning +2

Reinforcement Learning Algorithm Selection

no code implementations ICLR 2018 Romain Laroche, Raphael Feraud

This paper formalises the problem of online algorithm selection in the context of Reinforcement Learning.

reinforcement-learning reinforcement Learning

Separation of Concerns in Reinforcement Learning

no code implementations15 Dec 2016 Harm van Seijen, Mehdi Fatemi, Joshua Romoff, Romain Laroche

In this paper, we propose a framework for solving a single-agent task by using multiple agents, each focusing on different aspects of the task.

reinforcement-learning reinforcement Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.