Search Results for author: Olivier Pietquin

Found 92 papers, 29 papers with code

Learning Natural Language Generation with Truncated Reinforcement Learning

1 code implementation • NAACL 2022 • Alice Martin, Guillaume Quispe, Charles Ollion, Sylvain Le Corff, Florian Strub, Olivier Pietquin

To our knowledge, it is the first approach that successfully learns a language generation policy without pre-training, using only reinforcement learning.

Language Modelling Question Generation +4

Paper
Code

Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning

1 code implementation • 30 Apr 2024 • Mathieu Rita, Florian Strub, Rahma Chaabouni, Paul Michel, Emmanuel Dupoux, Olivier Pietquin

While Reinforcement Learning (RL) has been proven essential for tuning large language models (LLMs), it can lead to reward over-optimization (ROO).

Reinforcement Learning (RL) Text Generation

Paper
Code

Language Evolution with Deep Learning

no code implementations • 18 Mar 2024 • Mathieu Rita, Paul Michel, Rahma Chaabouni, Olivier Pietquin, Emmanuel Dupoux, Florian Strub

Computational modeling plays an essential role in the study of language emergence.

Paper
Add Code

Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning

no code implementations • 6 Mar 2024 • Zida Wu, Mathieu Lauriere, Samuel Jia Cong Chua, Matthieu Geist, Olivier Pietquin, Ankur Mehta

Mean Field Games (MFGs) have the ability to handle large-scale multi-agent systems, but learning Nash equilibria in MFGs remains a challenging task.

Paper
Add Code

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

no code implementations • 22 Feb 2024 • Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker

AI alignment in the shape of Reinforcement Learning from Human Feedback (RLHF) is increasingly treated as a crucial ingredient for high performance large language models.

Paper
Add Code

MusicRL: Aligning Music Generation to Human Preferences

no code implementations • 6 Feb 2024 • Geoffrey Cideron, Sertan Girgin, Mauro Verzetti, Damien Vincent, Matej Kastelic, Zalán Borsos, Brian McWilliams, Victor Ungureanu, Olivier Bachem, Olivier Pietquin, Matthieu Geist, Léonard Hussenot, Neil Zeghidour, Andrea Agostinelli

MusicRL is a pretrained autoregressive MusicLM (Agostinelli et al., 2023) model of discrete audio tokens finetuned with reinforcement learning to maximise sequence-level rewards.

Music Generation

Paper
Add Code

Learning Discrete-Time Major-Minor Mean Field Games

1 code implementation • 17 Dec 2023 • Kai Cui, Gökçe Dayanıklı, Mathieu Laurière, Matthieu Geist, Olivier Pietquin, Heinz Koeppl

We propose a novel discrete time version of major-minor MFGs (M3FGs), along with a learning algorithm based on fictitious play and partitioning the probability simplex.

Paper
Code

Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback

no code implementations • 31 May 2023 • Paul Roit, Johan Ferret, Lior Shani, Roee Aharoni, Geoffrey Cideron, Robert Dadashi, Matthieu Geist, Sertan Girgin, Léonard Hussenot, Orgad Keller, Nikola Momchev, Sabela Ramos, Piotr Stanczyk, Nino Vieillard, Olivier Bachem, Gal Elidan, Avinatan Hassidim, Olivier Pietquin, Idan Szpektor

Despite the seeming success of contemporary grounded text generation systems, they often tend to generate factually inconsistent text with respect to their input.

Abstractive Text Summarization Natural Language Inference +2

Paper
Add Code

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

1 code implementation • 22 May 2023 • Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári, Wataru Kumagai, Yutaka Matsuo

Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms.

regression Reinforcement Learning (RL)

Paper
Code

Get Back Here: Robust Imitation by Return-to-Distribution Planning

no code implementations • 2 May 2023 • Geoffrey Cideron, Baruch Tabanpour, Sebastian Curi, Sertan Girgin, Leonard Hussenot, Gabriel Dulac-Arnold, Matthieu Geist, Olivier Pietquin, Robert Dadashi

We consider the Imitation Learning (IL) setup where expert data are not collected on the actual deployment environment but on a different version.

Imitation Learning

Paper
Add Code

SingSong: Generating musical accompaniments from singing

no code implementations • 30 Jan 2023 • Chris Donahue, Antoine Caillon, Adam Roberts, Ethan Manilow, Philippe Esling, Andrea Agostinelli, Mauro Verzetti, Ian Simon, Olivier Pietquin, Neil Zeghidour, Jesse Engel

We present SingSong, a system that generates instrumental music to accompany input vocals, potentially offering musicians and non-musicians alike an intuitive new way to create music featuring their own voice.

Audio Generation Retrieval

Paper
Add Code

On the importance of data collection for training general goal-reaching policies

no code implementations • 7 Nov 2022 • Alexis Jacq, Manu Orsini, Gabriel Dulac-Arnold, Olivier Pietquin, Matthieu Geist, Olivier Bachem

Are the quantity and quality of data truly transformative to the performance of a general controller?

Continuous Control

Paper
Add Code

Emergent Communication: Generalization and Overfitting in Lewis Games

1 code implementation • 30 Sep 2022 • Mathieu Rita, Corentin Tallec, Paul Michel, Jean-bastien Grill, Olivier Pietquin, Emmanuel Dupoux, Florian Strub

Lewis signaling games are a class of simple communication games for simulating the emergence of language.

Paper
Code

vec2text with Round-Trip Translations

no code implementations • 14 Sep 2022 • Geoffrey Cideron, Sertan Girgin, Anton Raichuk, Olivier Pietquin, Olivier Bachem, Léonard Hussenot

We propose a simple data augmentation technique based on round-trip translations and show in extensive experiments that the resulting vec2text model surprisingly leads to vector spaces that fulfill our four desired properties and that this model strongly outperforms both standard and denoising auto-encoders.

Data Augmentation Denoising +1

Paper
Add Code

AudioLM: a Language Modeling Approach to Audio Generation

5 code implementations • 7 Sep 2022 • Zalán Borsos, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Dominik Roblek, Olivier Teboul, David Grangier, Marco Tagliasacchi, Neil Zeghidour

We introduce AudioLM, a framework for high-quality audio generation with long-term consistency.

Audio Generation Language Modelling

32,805

Paper
Code

Learning Correlated Equilibria in Mean-Field Games

no code implementations • 22 Aug 2022 • Paul Muller, Romuald Elie, Mark Rowland, Mathieu Lauriere, Julien Perolat, Sarah Perrin, Matthieu Geist, Georgios Piliouras, Olivier Pietquin, Karl Tuyls

The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts.

Paper
Add Code

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

no code implementations • 27 May 2022 • Tadashi Kozuno, Wenhao Yang, Nino Vieillard, Toshinori Kitamura, Yunhao Tang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Michal Valko, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári

In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Learning in Mean Field Games: A Survey

no code implementations • 25 May 2022 • Mathieu Laurière, Sarah Perrin, Julien Pérolat, Sertan Girgin, Paul Muller, Romuald Élie, Matthieu Geist, Olivier Pietquin

Non-cooperative and cooperative games with a very large number of players have many applications but remain generally intractable when the number of players increases.

Reinforcement Learning (RL)

Paper
Add Code

Scalable Deep Reinforcement Learning Algorithms for Mean Field Games

no code implementations • 22 Mar 2022 • Mathieu Laurière, Sarah Perrin, Sertan Girgin, Paul Muller, Ayush Jain, Theophile Cabannes, Georgios Piliouras, Julien Pérolat, Romuald Élie, Olivier Pietquin, Matthieu Geist

One limiting factor to further scale up using RL is that existing algorithms to solve MFGs require the mixing of approximated quantities such as strategies or $q$-values.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Lazy-MDPs: Towards Interpretable Reinforcement Learning by Learning When to Act

no code implementations • 16 Mar 2022 • Alexis Jacq, Johan Ferret, Olivier Pietquin, Matthieu Geist

We deem those states and corresponding actions important since they explain the difference in performance between the default and the new, lazy policy.

Atari Games Decision Making +2

Paper
Add Code

RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning

1 code implementation • 4 Nov 2021 • Sabela Ramos, Sertan Girgin, Léonard Hussenot, Damien Vincent, Hanna Yakubovich, Daniel Toyama, Anita Gergely, Piotr Stanczyk, Raphael Marinier, Jeremiah Harmsen, Olivier Pietquin, Nikola Momchev

We introduce RLDS (Reinforcement Learning Datasets), an ecosystem for recording, replaying, manipulating, annotating and sharing data in the context of Sequential Decision Making (SDM) including Reinforcement Learning (RL), Learning from Demonstrations, Offline RL or Imitation Learning.

Imitation Learning Offline RL +2

224

Paper
Code

Continuous Control with Action Quantization from Demonstrations

1 code implementation • 19 Oct 2021 • Robert Dadashi, Léonard Hussenot, Damien Vincent, Sertan Girgin, Anton Raichuk, Matthieu Geist, Olivier Pietquin

The proposed approach consists in learning a discretization of continuous action spaces from human demonstrations.

Continuous Control Imitation Learning +2

32,932

Paper
Code

Learning Natural Language Generation from Scratch

no code implementations • 20 Sep 2021 • Alice Martin Donati, Guillaume Quispe, Charles Ollion, Sylvain Le Corff, Florian Strub, Olivier Pietquin

This paper introduces TRUncated ReinForcement Learning for Language (TrufLL), an original ap-proach to train conditional language models from scratch by only using reinforcement learning (RL).

Language Modelling reinforcement-learning +2

Paper
Add Code

Generalization in Mean Field Games by Learning Master Policies

no code implementations • 20 Sep 2021 • Sarah Perrin, Mathieu Laurière, Julien Pérolat, Romuald Élie, Matthieu Geist, Olivier Pietquin

Mean Field Games (MFGs) can potentially scale multi-agent systems to extremely large populations of agents.

Paper
Add Code

Implicitly Regularized RL with Implicit Q-Values

no code implementations • 16 Aug 2021 • Nino Vieillard, Marcin Andrychowicz, Anton Raichuk, Olivier Pietquin, Matthieu Geist

to $Q$.

Reinforcement Learning (RL)

Paper
Add Code

Offline Reinforcement Learning as Anti-Exploration

no code implementations • 11 Jun 2021 • Shideh Rezaeifar, Robert Dadashi, Nino Vieillard, Léonard Hussenot, Olivier Bachem, Olivier Pietquin, Matthieu Geist

This is the converse of exploration in RL, which favors such actions.

Continuous Control Offline RL +2

Paper
Add Code

There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning

no code implementations • NeurIPS 2021 • Nathan Grinsztajn, Johan Ferret, Olivier Pietquin, Philippe Preux, Matthieu Geist

We propose to learn to distinguish reversible from irreversible actions for better informed decision-making in Reinforcement Learning (RL).

Decision Making Reinforcement Learning (RL)

Paper
Add Code

Concave Utility Reinforcement Learning: the Mean-Field Game Viewpoint

no code implementations • 7 Jun 2021 • Matthieu Geist, Julien Pérolat, Mathieu Laurière, Romuald Elie, Sarah Perrin, Olivier Bachem, Rémi Munos, Olivier Pietquin

Mean-field Games (MFGs) are a continuous approximation of many-agent RL.

Imitation Learning reinforcement-learning +1

Paper
Add Code

What Matters for Adversarial Imitation Learning?

1 code implementation • NeurIPS 2021 • Manu Orsini, Anton Raichuk, Léonard Hussenot, Damien Vincent, Robert Dadashi, Sertan Girgin, Matthieu Geist, Olivier Bachem, Olivier Pietquin, Marcin Andrychowicz

To tackle this issue, we implement more than 50 of these choices in a generic adversarial imitation learning framework and investigate their impacts in a large-scale study (>500k trained agents) with both synthetic and human-generated demonstrations.

Continuous Control Imitation Learning

388

Paper
Code

Hyperparameter Selection for Imitation Learning

no code implementations • 25 May 2021 • Leonard Hussenot, Marcin Andrychowicz, Damien Vincent, Robert Dadashi, Anton Raichuk, Lukasz Stafiniak, Sertan Girgin, Raphael Marinier, Nikola Momchev, Sabela Ramos, Manu Orsini, Olivier Bachem, Matthieu Geist, Olivier Pietquin

The vast literature in imitation learning mostly considers this reward function to be available for HP selection, but this is not a realistic setting.

Continuous Control Imitation Learning

Paper
Add Code

Don't Do What Doesn't Matter: Intrinsic Motivation with Action Usefulness

1 code implementation • 20 May 2021 • Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin

Sparse rewards are double-edged training signals in reinforcement learning: easy to design but hard to optimize.

Paper
Code

Mean Field Games Flock! The Reinforcement Learning Way

no code implementations • 17 May 2021 • Sarah Perrin, Mathieu Laurière, Julien Pérolat, Matthieu Geist, Romuald Élie, Olivier Pietquin

We present a method enabling a large number of agents to learn how to flock, which is a natural behavior observed in large populations of animals.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Relevant Action Matters : Motivating agent with action usefulness

no code implementations • ICLR Workshop SSL-RL 2021 • Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin

We evaluate RAM on the procedurally-generated environment MiniGrid, against state-of-the-art methods.

Paper
Add Code

Offline Reinforcement Learning with Pseudometric Learning

no code implementations • ICLR Workshop SSL-RL 2021 • Robert Dadashi, Shideh Rezaeifar, Nino Vieillard, Léonard Hussenot, Olivier Pietquin, Matthieu Geist

In the presence of function approximation, and under the assumption of limited coverage of the state-action space of the environment, it is necessary to enforce the policy to visit state-action pairs close to the support of logged transitions.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Scaling up Mean Field Games with Online Mirror Descent

1 code implementation • 28 Feb 2021 • Julien Perolat, Sarah Perrin, Romuald Elie, Mathieu Laurière, Georgios Piliouras, Matthieu Geist, Karl Tuyls, Olivier Pietquin

We address scaling up equilibrium computation in Mean Field Games (MFGs) using Online Mirror Descent (OMD).

4,018

Paper
Code

Adversarially Guided Actor-Critic

1 code implementation • ICLR 2021 • Yannis Flet-Berliac, Johan Ferret, Olivier Pietquin, Philippe Preux, Matthieu Geist

Despite definite success in deep reinforcement learning problems, actor-critic algorithms are still confronted with sample inefficiency in complex environments, particularly in tasks where efficient exploration is a bottleneck.

Efficient Exploration

Paper
Code

What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study

no code implementations • ICLR 2021 • Marcin Andrychowicz, Anton Raichuk, Piotr Stańczyk, Manu Orsini, Sertan Girgin, Raphaël Marinier, Leonard Hussenot, Matthieu Geist, Olivier Pietquin, Marcin Michalski, Sylvain Gelly, Olivier Bachem

In recent years, reinforcement learning (RL) has been successfully applied to many different continuous control tasks.

Attribute Continuous Control +1

Paper
Add Code

Self-Imitation Advantage Learning

no code implementations • 22 Dec 2020 • Johan Ferret, Olivier Pietquin, Matthieu Geist

Self-imitation learning is a Reinforcement Learning (RL) method that encourages actions whose returns were higher than expected, which helps in hard exploration and sparse reward problems.

Atari Games Imitation Learning +1

Paper
Add Code

Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning

no code implementations • NeurIPS 2020 • Nino Vieillard, Tadashi Kozuno, Bruno Scherrer, Olivier Pietquin, Remi Munos, Matthieu Geist

Recent Reinforcement Learning (RL) algorithms making use of Kullback-Leibler (KL) regularization as a core component have shown outstanding performance.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Learning from Heterogeneous EEG Signals with Differentiable Channel Reordering

no code implementations • 21 Oct 2020 • Aaqib Saeed, David Grangier, Olivier Pietquin, Neil Zeghidour

We propose CHARM, a method for training a single neural network across inconsistent input channels.

EEG

Paper
Add Code

Supervised Seeded Iterated Learning for Interactive Language Learning

no code implementations • EMNLP 2020 • Yuchen Lu, Soumye Singhal, Florian Strub, Olivier Pietquin, Aaron Courville

Language drift has been one of the major obstacles to train language models through interaction.

Translation

Paper
Add Code

A Machine of Few Words -- Interactive Speaker Recognition with Reinforcement Learning

no code implementations • 7 Aug 2020 • Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin

To do so, we cast the speaker recognition task into a sequential decision-making problem that we solve with Reinforcement Learning.

Decision Making reinforcement-learning +3

Paper
Add Code

Munchausen Reinforcement Learning

6 code implementations • NeurIPS 2020 • Nino Vieillard, Olivier Pietquin, Matthieu Geist

Bootstrapping is a core mechanism in Reinforcement Learning (RL).

Ranked #9 on Atari Games on Atari-57

Atari Games reinforcement-learning +1

32,940

Paper
Code

The Monte Carlo Transformer: a stochastic self-attention model for sequence prediction

no code implementations • 15 Jul 2020 • Alice Martin, Charles Ollion, Florian Strub, Sylvain Le Corff, Olivier Pietquin

This paper introduces the Sequential Monte Carlo Transformer, an original approach that naturally captures the observations distribution in a transformer architecture.

Paper
Add Code

Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications

1 code implementation • NeurIPS 2020 • Sarah Perrin, Julien Perolat, Mathieu Laurière, Matthieu Geist, Romuald Elie, Olivier Pietquin

In this paper, we deepen the analysis of continuous time Fictitious Play learning algorithm to the consideration of various finite state Mean Field Game settings (finite horizon, $\gamma$-discounted), allowing in particular for the introduction of an additional common noise.

4,018

Paper
Code

Show me the Way: Intrinsic Motivation from Demonstrations

no code implementations • 23 Jun 2020 • Léonard Hussenot, Robert Dadashi, Matthieu Geist, Olivier Pietquin

Using an inverse RL approach, we show that complex exploration behaviors, reflecting different motivations, can be learnt and efficiently used by RL agents to solve tasks for which exhaustive exploration is prohibitive.

Decision Making Experimental Design +1

Paper
Add Code

What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study

1 code implementation • 10 Jun 2020 • Marcin Andrychowicz, Anton Raichuk, Piotr Stańczyk, Manu Orsini, Sertan Girgin, Raphael Marinier, Léonard Hussenot, Matthieu Geist, Olivier Pietquin, Marcin Michalski, Sylvain Gelly, Olivier Bachem

In recent years, on-policy reinforcement learning (RL) has been successfully applied to many different continuous control tasks.

Attribute Continuous Control +2

193

Paper
Code

Primal Wasserstein Imitation Learning

2 code implementations • ICLR 2021 • Robert Dadashi, Léonard Hussenot, Matthieu Geist, Olivier Pietquin

Imitation Learning (IL) methods seek to match the behavior of an agent with that of an expert.

Continuous Control Imitation Learning

32,943

Paper
Code

Acme: A Research Framework for Distributed Reinforcement Learning

5 code implementations • 1 Jun 2020 • Matthew W. Hoffman, Bobak Shahriari, John Aslanides, Gabriel Barth-Maron, Nikola Momchev, Danila Sinopalnikov, Piotr Stańczyk, Sabela Ramos, Anton Raichuk, Damien Vincent, Léonard Hussenot, Robert Dadashi, Gabriel Dulac-Arnold, Manu Orsini, Alexis Jacq, Johan Ferret, Nino Vieillard, Seyed Kamyar Seyed Ghasemipour, Sertan Girgin, Olivier Pietquin, Feryal Behbahani, Tamara Norman, Abbas Abdolmaleki, Albin Cassirer, Fan Yang, Kate Baumli, Sarah Henderson, Abe Friesen, Ruba Haroun, Alex Novikov, Sergio Gómez Colmenarejo, Serkan Cabi, Caglar Gulcehre, Tom Le Paine, Srivatsan Srinivasan, Andrew Cowie, Ziyu Wang, Bilal Piot, Nando de Freitas

These implementations serve both as a validation of our design decisions as well as an important contribution to reproducibility in RL research.

DQN Replay Dataset reinforcement-learning +1

3,386

Paper
Code

Reinforcement Learning

no code implementations • 29 May 2020 • Olivier Buffet, Olivier Pietquin, Paul Weng

Reinforcement learning (RL) is a general framework for adaptive control, which has proven to be efficient in many domains, e. g., board games, video games or autonomous vehicles.

Autonomous Vehicles Board Games +3

Paper
Add Code

Leverage the Average: an Analysis of KL Regularization in RL

no code implementations • 31 Mar 2020 • Nino Vieillard, Tadashi Kozuno, Bruno Scherrer, Olivier Pietquin, Rémi Munos, Matthieu Geist

Recent Reinforcement Learning (RL) algorithms making use of Kullback-Leibler (KL) regularization as a core component have shown outstanding performance.

Reinforcement Learning (RL)

Paper
Add Code

Countering Language Drift with Seeded Iterated Learning

no code implementations • ICML 2020 • Yuchen Lu, Soumye Singhal, Florian Strub, Olivier Pietquin, Aaron Courville

At each time step, the teacher is created by copying the student agent, before being finetuned to maximize task completion.

Translation

Paper
Add Code

HIGhER : Improving instruction following with Hindsight Generation for Experience Replay

no code implementations • 21 Oct 2019 • Geoffrey Cideron, Mathieu Seurin, Florian Strub, Olivier Pietquin

Language creates a compact representation of the world and allows the description of unlimited situations and objectives through compositionality.

Instruction Following Language Acquisition

Paper
Add Code

Momentum in Reinforcement Learning

no code implementations • 21 Oct 2019 • Nino Vieillard, Bruno Scherrer, Olivier Pietquin, Matthieu Geist

We adapt the optimization's concept of momentum to reinforcement learning.

Atari Games reinforcement-learning +1

Paper
Add Code

On Connections between Constrained Optimization and Reinforcement Learning

no code implementations • 18 Oct 2019 • Nino Vieillard, Olivier Pietquin, Matthieu Geist

In this paper, we draw connections between DP and (constrained) convex optimization.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

I'm sorry Dave, I'm afraid I can't do that, Deep Q-learning from forbidden action

no code implementations • 4 Oct 2019 • Mathieu Seurin, Philippe Preux, Olivier Pietquin

Violating constraints thus results in rejected actions or entering in a safe mode driven by an external controller, making RL agents incapable of learning from their mistakes.

Industrial Robots Q-Learning +2

Paper
Add Code

Self-Educated Language Agent with Hindsight Experience Replay for Instruction Following

no code implementations • 25 Sep 2019 • Geoffrey Cideron, Mathieu Seurin, Florian Strub, Olivier Pietquin

Language creates a compact representation of the world and allows the description of unlimited situations and objectives through compositionality.

Instruction Following Language Acquisition

Paper
Add Code

Self-Attentional Credit Assignment for Transfer in Reinforcement Learning

1 code implementation • 18 Jul 2019 • Johan Ferret, Raphaël Marinier, Matthieu Geist, Olivier Pietquin

The ability to transfer knowledge to novel environments and tasks is a sensible desiderata for general learning agents.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Code

On the Convergence of Model Free Learning in Mean Field Games

no code implementations • 4 Jul 2019 • Romuald Elie, Julien Pérolat, Mathieu Laurière, Matthieu Geist, Olivier Pietquin

In order to design scalable algorithms for systems with a large population of interacting agents (e. g. swarms), this paper focuses on Mean Field MAS, where the number of agents is asymptotically infinite.

Paper
Add Code

MULEX: Disentangling Exploitation from Exploration in Deep RL

no code implementations • 1 Jul 2019 • Lucas Beyer, Damien Vincent, Olivier Teboul, Sylvain Gelly, Matthieu Geist, Olivier Pietquin

An agent learning through interactions should balance its action selection process between probing the environment to discover new rewards and using the information acquired in the past to adopt useful behaviour.

Paper
Add Code

Deep Conservative Policy Iteration

no code implementations • 24 Jun 2019 • Nino Vieillard, Olivier Pietquin, Matthieu Geist

Conservative Policy Iteration (CPI) is a founding algorithm of Approximate Dynamic Programming (ADP).

Atari Games Reinforcement Learning (RL)

Paper
Add Code

Foolproof Cooperative Learning

no code implementations • 24 Jun 2019 • Alexis Jacq, Julien Perolat, Matthieu Geist, Olivier Pietquin

We prove that in repeated symmetric games, this algorithm is a learning equilibrium.

Paper
Add Code

CopyCAT: Taking Control of Neural Policies with Constant Attacks

no code implementations • 29 May 2019 • Léonard Hussenot, Matthieu Geist, Olivier Pietquin

In this setting, the adversary cannot directly modify the agent's state -- its representation of the environment -- but can only attack the agent's observation -- its perception of the environment.

Atari Games reinforcement-learning +1

Paper
Add Code

Towards Consistent Performance on Atari using Expert Demonstrations

no code implementations • ICLR 2019 • Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Večerík, Matteo Hessel, Rémi Munos, Olivier Pietquin

Despite significant advances in the field of deep Reinforcement Learning (RL), today's algorithms still fail to learn human-level policies consistently over a set of diverse tasks such as Atari 2600 games.

Atari Games Reinforcement Learning (RL)

Paper
Add Code

Budgeted Reinforcement Learning in Continuous State Space

1 code implementation • NeurIPS 2019 • Nicolas Carrara, Edouard Leurent, Romain Laroche, Tanguy Urvoy, Odalric-Ambrym Maillard, Olivier Pietquin

A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints.

Autonomous Driving reinforcement-learning +1

539

Paper
Code

A Theory of Regularized Markov Decision Processes

no code implementations • 31 Jan 2019 • Matthieu Geist, Bruno Scherrer, Olivier Pietquin

Many recent successful (deep) reinforcement learning algorithms make use of regularization, generally based on entropy or Kullback-Leibler divergence.

Q-Learning

Paper
Add Code

Playing the Game of Universal Adversarial Perturbations

no code implementations • 20 Sep 2018 • Julien Perolat, Mateusz Malinowski, Bilal Piot, Olivier Pietquin

We study the problem of learning classifiers robust to universal adversarial perturbations.

Classification General Classification +1

Paper
Add Code

Visual Reasoning with Multi-hop Feature Modulation

1 code implementation • ECCV 2018 • Florian Strub, Mathieu Seurin, Ethan Perez, Harm de Vries, Jérémie Mary, Philippe Preux, Aaron Courville, Olivier Pietquin

Recent breakthroughs in computer vision and natural language processing have spurred interest in challenging multi-modal tasks such as visual question-answering and visual dialogue.

Question Answering Visual Dialog +2

Paper
Code

Observe and Look Further: Achieving Consistent Performance on Atari

no code implementations • 29 May 2018 • Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Večerík, Matteo Hessel, Rémi Munos, Olivier Pietquin

Montezuma's Revenge Reinforcement Learning (RL)

Paper
Add Code

End-to-End Automatic Speech Translation of Audiobooks

1 code implementation • 12 Feb 2018 • Alexandre Bérard, Laurent Besacier, Ali Can Kocabiyikoglu, Olivier Pietquin

We investigate end-to-end speech-to-text translation on a corpus of audiobooks specifically augmented for this task.

Speech-to-Text Translation Translation

Paper
Code

LIG-CRIStAL Submission for the WMT 2017 Automatic Post-Editing Task

no code implementations • WS 2017 • Alex B{\'e}rard, re, Laurent Besacier, Olivier Pietquin

Automatic Post-Editing Spelling Correction

Paper
Add Code

Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards

4 code implementations • 27 Jul 2017 • Mel Vecerik, Todd Hester, Jonathan Scholz, Fumin Wang, Olivier Pietquin, Bilal Piot, Nicolas Heess, Thomas Rothörl, Thomas Lampe, Martin Riedmiller

We propose a general and model-free approach for Reinforcement Learning (RL) on real robotics with sparse rewards.

reinforcement-learning Reinforcement Learning (RL)

816

Paper
Code

LIG-CRIStAL System for the WMT17 Automatic Post-Editing Task

no code implementations • 17 Jul 2017 • Alexandre Berard, Olivier Pietquin, Laurent Besacier

This paper presents the LIG-CRIStAL submission to the shared Automatic Post- Editing task of WMT 2017.

Automatic Post-Editing Sentence

Paper
Add Code

Modulating early visual processing by language

3 code implementations • NeurIPS 2017 • Harm de Vries, Florian Strub, Jérémie Mary, Hugo Larochelle, Olivier Pietquin, Aaron Courville

It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected.

Question Answering Visual Question Answering

Paper
Code

Noisy Networks for Exploration

15 code implementations • ICLR 2018 • Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg

We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent's policy can be used to aid efficient exploration.

Ranked #1 on Atari Games on Atari 2600 Surround

Atari Games Efficient Exploration +2

2,602

Paper
Code

Observational Learning by Reinforcement Learning

no code implementations • 20 Jun 2017 • Diana Borsa, Bilal Piot, Rémi Munos, Olivier Pietquin

Observational learning is a type of learning that occurs as a function of observing, retaining and possibly replicating or imitating the behaviour of another agent.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Deep Q-learning from Demonstrations

5 code implementations • 12 Apr 2017 • Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys

We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism.

Imitation Learning Q-Learning +1

2,602

Paper
Code

End-to-end optimization of goal-driven and visually grounded dialogue systems

2 code implementations • 15 Mar 2017 • Florian Strub, Harm de Vries, Jeremie Mary, Bilal Piot, Aaron Courville, Olivier Pietquin

End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architectures for sequence-to-sequence learning.

Decoder Dialogue Management +2

Paper
Code

Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation

1 code implementation • 6 Dec 2016 • Alexandre Berard, Olivier Pietquin, Christophe Servan, Laurent Besacier

This paper proposes a first attempt to build an end-to-end speech-to-text translation system, which does not use source language transcription during learning or decoding.

Speech-to-Text Translation Translation

388

Paper
Code

GuessWhat?! Visual object discovery through multi-modal dialogue

4 code implementations • CVPR 2017 • Harm de Vries, Florian Strub, Sarath Chandar, Olivier Pietquin, Hugo Larochelle, Aaron Courville

Our key contribution is the collection of a large-scale dataset consisting of 150K human-played games with a total of 800K visual question-answer pairs on 66K images.

Object Object Discovery

Paper
Code

Is the Bellman residual a bad proxy?

no code implementations • NeurIPS 2017 • Matthieu Geist, Bilal Piot, Olivier Pietquin

This paper aims at theoretically and empirically comparing two standard optimization criteria for Reinforcement Learning: i) maximization of the mean value and ii) minimization of the Bellman residual.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Difference of Convex Functions Programming Applied to Control with Expert Data

no code implementations • 3 Jun 2016 • Bilal Piot, Matthieu Geist, Olivier Pietquin

This paper reports applications of Difference of Convex functions (DC) programming to Learning from Demonstrations (LfD) and Reinforcement Learning (RL) with expert data.

General Classification reinforcement-learning +1

Paper
Add Code

MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP

1 code implementation • LREC 2016 • Alex B{\'e}rard, re, Christophe Servan, Olivier Pietquin, Laurent Besacier

We present MultiVec, a new toolkit for computing continuous representations for text at different granularity levels (word-level or sequences of words).

Document Classification General Classification +2

116

Paper
Code

Human-Machine Dialogue as a Stochastic Game

no code implementations • WS 2015 • Merwan Barlier, Julien Perolat, Romain Laroche, Olivier Pietquin

Decision Making Multi-agent Reinforcement Learning +3

Paper
Add Code

Difference of Convex Functions Programming for Reinforcement Learning

no code implementations • NeurIPS 2014 • Bilal Piot, Matthieu Geist, Olivier Pietquin

Controlling this residual allows controlling the distance to the optimal action-value function, and we show that minimizing an empirical norm of the OBR is consistant in the Vapnik sense.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

NASTIA: Negotiating Appointment Setting Interface

no code implementations • LREC 2014 • Layla El Asri, R{\'e}mi Lemonnier, Romain Laroche, Olivier Pietquin, Hatim Khouzaimi

Appointment scheduling is a hybrid task halfway between slot-filling and negotiation.

Decision Making Scheduling +3

Paper
Add Code

DINASTI: Dialogues with a Negotiating Appointment Setting Interface

no code implementations • LREC 2014 • Layla El Asri, Romain Laroche, Olivier Pietquin

NASTIA is a reinforcement learning-based system.

Dialogue Management Management +4

Paper
Add Code

Kalman Temporal Differences

no code implementations • 16 Jan 2014 • Matthieu Geist, Olivier Pietquin

Because reinforcement learning suffers from a lack of scalability, online value (and Q-) function approximation has received increasing interest this last decade.

Management reinforcement-learning +1