no code implementations • LREC 2014 • Layla El Asri, R{\'e}mi Lemonnier, Romain Laroche, Olivier Pietquin, Hatim Khouzaimi
Appointment scheduling is a hybrid task halfway between slot-filling and negotiation.
no code implementations • LREC 2014 • Layla El Asri, Romain Laroche, Olivier Pietquin
NASTIA is a reinforcement learning-based system.
no code implementations • 15 Dec 2016 • Harm van Seijen, Mehdi Fatemi, Joshua Romoff, Romain Laroche
In this paper, we propose a framework for solving a single-agent task by using multiple agents, each focusing on different aspects of the task.
no code implementations • ICLR 2018 • Romain Laroche, Raphael Feraud
This paper formalises the problem of online algorithm selection in the context of Reinforcement Learning.
no code implementations • ICLR 2018 • Romain Laroche, Mehdi Fatemi, Joshua Romoff, Harm van Seijen
We consider tackling a single-agent RL problem by distributing it to $n$ learners.
1 code implementation • NeurIPS 2017 • Harm van Seijen, Mehdi Fatemi, Joshua Romoff, Romain Laroche, Tavian Barnes, Jeffrey Tsang
One of the main challenges in reinforcement learning (RL) is generalisation.
no code implementations • 5 Jul 2017 • Romain Laroche
This position paper formalises an abstract model for complex negotiation dialogue.
2 code implementations • 19 Dec 2017 • Romain Laroche, Paul Trichelair, Rémi Tachet des Combes
Finally, we implement a model-free version of SPIBB and show its benefits on a navigation task with deep RL implementation called SPIBB-DQN, which is, to the best of our knowledge, the first RL algorithm relying on a neural network representation able to train efficiently and reliably from batch data, without any interaction with the environment.
2 code implementations • 29 Jun 2018 • Xingdi Yuan, Marc-Alexandre Côté, Alessandro Sordoni, Romain Laroche, Remi Tachet des Combes, Matthew Hausknecht, Adam Trischler
We propose a recurrent RL agent with an episodic exploration mechanism that helps discovering good policies in text-based game environments.
no code implementations • 19 Nov 2018 • Raphaël Féraud, Réda Alami, Romain Laroche
We consider the decentralized exploration problem: a set of players collaborate to identify the best arm by asynchronously interacting with the same stochastic environment.
1 code implementation • NeurIPS 2019 • Nicolas Carrara, Edouard Leurent, Romain Laroche, Tanguy Urvoy, Odalric-Ambrym Maillard, Olivier Pietquin
A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints.
2 code implementations • 11 Jul 2019 • Kimia Nadjahi, Romain Laroche, Rémi Tachet des Combes
Batch Reinforcement Learning (Batch RL) consists in training a policy using trajectories collected with another policy, called the behavioural policy.
no code implementations • 11 Sep 2019 • Thiago D. Simão, Romain Laroche, Rémi Tachet des Combes
Previous work has shown the unreliability of existing algorithms in the batch Reinforcement Learning setting, and proposed the theoretically-grounded Safe Policy Improvement with Baseline Bootstrapping (SPIBB) fix: reproduce the baseline policy in the uncertain state-action pairs, in order to control the variance on the trained policy performance.
1 code implementation • 21 Oct 2019 • Mikuláš Zelinka, Xingdi Yuan, Marc-Alexandre Côté, Romain Laroche, Adam Trischler
We are interested in learning how to update Knowledge Graphs (KG) from text.
1 code implementation • NeurIPS 2020 • Ashutosh Adhikari, Xingdi Yuan, Marc-Alexandre Côté, Mikuláš Zelinka, Marc-Antoine Rondeau, Romain Laroche, Pascal Poupart, Jian Tang, Adam Trischler, William L. Hamilton
Playing text-based games requires skills in processing natural language and sequential decision making.
1 code implementation • 22 Feb 2020 • Dmitrii Krylov, Remi Tachet, Romain Laroche, Michael Rosenblum, Dmitry V. Dylov
Malfunctioning neurons in the brain sometimes operate synchronously, reportedly causing many neurological diseases, e. g. Parkinson's.
1 code implementation • 2 Oct 2020 • Shangtong Zhang, Romain Laroche, Harm van Seijen, Shimon Whiteson, Remi Tachet des Combes
In the second scenario, we consider optimizing a discounted objective ($\gamma < 1$) and propose to interpret the omission of the discounting in the actor update from an auxiliary task perspective and provide supporting empirical results.
no code implementations • NeurIPS 2021 • Harsh Satija, Philip S. Thomas, Joelle Pineau, Romain Laroche
We study the problem of Safe Policy Improvement (SPI) under constraints in the offline Reinforcement Learning (RL) setting.
1 code implementation • CoNLL (EMNLP) 2021 • Eva Portelance, Michael C. Frank, Dan Jurafsky, Alessandro Sordoni, Romain Laroche
By the age of two, children tend to assume that new word categories are based on objects' shape, rather than their color or texture; this assumption is called the shape bias.
1 code implementation • 29 Sep 2021 • Romain Laroche, Remi Tachet
To implement the principles prescribed by our theory, we propose an agent, Dr Jekyll & Mr Hyde (JH), with a double personality: Dr Jekyll purely exploits while Mr Hyde purely explores.
no code implementations • 29 Sep 2021 • Romain Laroche, Othmane Safsafi, Raphael Feraud, Nicolas Broutin
In Batched Multi-Armed Bandits (BMAB), the policy is not allowed to be updated at each time step.
no code implementations • 29 Sep 2021 • Yuchen Lu, Zhen Liu, Alessandro Sordoni, Aristide Baratin, Romain Laroche, Aaron Courville
In this work, we argue that representations induced by self-supervised learning (SSL) methods should both be expressive and learnable.
1 code implementation • NeurIPS 2023 • Shangtong Zhang, Remi Tachet, Romain Laroche
In this paper, we establish the global optimality and convergence rate of an off-policy actor critic algorithm in the tabular setting without using density ratio to correct the discrepancy between the state distribution of the behavior policy and that of the target policy.
no code implementations • NeurIPS 2021 • Romain Laroche, Remi Tachet des Combes
To implement the principles prescribed by our theory, we propose an agent, Dr Jekyll & Mr Hyde (J&H), with a double personality: Dr Jekyll purely exploits while Mr Hyde purely explores.
no code implementations • 14 Feb 2022 • Shangtong Zhang, Remi Tachet, Romain Laroche
SARSA, a classical on-policy control algorithm for reinforcement learning, is known to chatter when combined with linear function approximation: SARSA does not diverge but oscillates in a bounded region.
no code implementations • 15 Feb 2022 • Romain Laroche, Remi Tachet
To increase the unlearning speed, we study a novel policy update: the gradient of the cross-entropy loss with respect to the action maximizing $q$, but find that such updates may lead to a decrease in value.
no code implementations • 9 Mar 2022 • Nathaniel Weir, Xingdi Yuan, Marc-Alexandre Côté, Matthew Hausknecht, Romain Laroche, Ida Momennejad, Harm van Seijen, Benjamin Van Durme
Humans have the capability, aided by the expressive compositionality of their language, to learn quickly by demonstration.
no code implementations • 27 May 2022 • Romain Laroche, Remi Tachet des Combes, Jacob Buckman
A central object of study in Reinforcement Learning (RL) is the Markovian policy, in which an agent's actions are chosen from a memoryless probability distribution, conditioned only on its current state.
1 code implementation • 2 Jun 2022 • David Brandfonbrener, Alberto Bietti, Jacob Buckman, Romain Laroche, Joan Bruna
Several recent works have proposed a class of algorithms for the offline reinforcement learning (RL) problem that we will refer to as return-conditioned supervised learning (RCSL).
no code implementations • 2 Jun 2022 • Yuchen Lu, Zhen Liu, Aristide Baratin, Romain Laroche, Aaron Courville, Alessandro Sordoni
We address the problem of evaluating the quality of self-supervised learning (SSL) models without access to supervised labels, while being agnostic to the architecture, learning algorithm or data manipulation used during training.
no code implementations • 2 Jun 2022 • David Brandfonbrener, Remi Tachet des Combes, Romain Laroche
In this work, we develop a novel method for incorporating scalable uncertainty estimates into an offline reinforcement learning algorithm called deep-SPIBB that extends the SPIBB family of algorithms to environments with larger state and action spaces.
no code implementations • 3 Oct 2022 • Tristan Karch, Yoann Lemesle, Romain Laroche, Clément Moulin-Frier, Pierre-Yves Oudeyer
In this paper, we investigate whether artificial agents can develop a shared language in an ecological setting where communication relies on a sensory-motor channel.
no code implementations • 1 Nov 2022 • Riashat Islam, Hongyu Zang, Anirudh Goyal, Alex Lamb, Kenji Kawaguchi, Xin Li, Romain Laroche, Yoshua Bengio, Remi Tachet des Combes
Goal-conditioned reinforcement learning (RL) is a promising direction for training agents that are capable of solving multiple tasks and reach a diverse set of objectives.
1 code implementation • 2 Nov 2022 • Hongyu Zang, Xin Li, Jie Yu, Chen Liu, Riashat Islam, Remi Tachet des Combes, Romain Laroche
Our method, Behavior Prior Representation (BPR), learns state representations with an easy-to-integrate objective based on behavior cloning of the dataset: we first learn a state representation by mimicking actions from the dataset, and then train a policy on top of the fixed representation, using any off-the-shelf Offline RL algorithm.
1 code implementation • 24 May 2023 • Jikun Kang, Romain Laroche, Xindi Yuan, Adam Trischler, Xue Liu, Jie Fu
We argue that this inefficiency stems from the forgetting phenomenon, in which a model memorizes its behaviors in parameters throughout training.
1 code implementation • 22 Jun 2023 • Zhang-Wei Hong, Pulkit Agrawal, Rémi Tachet des Combes, Romain Laroche
This re-weighted sampling strategy may be combined with any offline RL algorithm.
1 code implementation • 30 Sep 2023 • Mingde Zhao, Safa Alver, Harm van Seijen, Romain Laroche, Doina Precup, Yoshua Bengio
Inspired by human conscious planning, we propose Skipper, a model-based reinforcement learning framework utilizing spatio-temporal abstractions to generalize better in novel situations.
1 code implementation • NeurIPS 2023 • Zhang-Wei Hong, Aviral Kumar, Sathwik Karnik, Abhishek Bhandwaldar, Akash Srivastava, Joni Pajarinen, Romain Laroche, Abhishek Gupta, Pulkit Agrawal
We argue this is due to an assumption made by current offline RL algorithms of staying close to the trajectories in the dataset.