Search Results for author: Olivier Pietquin

Found 74 papers, 21 papers with code

Learning Mean Field Games: A Survey

no code implementations25 May 2022 Mathieu Laurière, Sarah Perrin, Matthieu Geist, Olivier Pietquin

Non-cooperative and cooperative games with a very large number of players have many applications but remain generally intractable when the number of players increases.

Scalable Deep Reinforcement Learning Algorithms for Mean Field Games

no code implementations22 Mar 2022 Mathieu Laurière, Sarah Perrin, Sertan Girgin, Paul Muller, Ayush Jain, Theophile Cabannes, Georgios Piliouras, Julien Pérolat, Romuald Élie, Olivier Pietquin, Matthieu Geist

One limiting factor to further scale up using RL is that existing algorithms to solve MFGs require the mixing of approximated quantities such as strategies or $q$-values.

reinforcement-learning

Lazy-MDPs: Towards Interpretable Reinforcement Learning by Learning When to Act

no code implementations16 Mar 2022 Alexis Jacq, Johan Ferret, Olivier Pietquin, Matthieu Geist

We deem those states and corresponding actions important since they explain the difference in performance between the default and the new, lazy policy.

Atari Games Decision Making +1

RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning

1 code implementation4 Nov 2021 Sabela Ramos, Sertan Girgin, Léonard Hussenot, Damien Vincent, Hanna Yakubovich, Daniel Toyama, Anita Gergely, Piotr Stanczyk, Raphael Marinier, Jeremiah Harmsen, Olivier Pietquin, Nikola Momchev

We introduce RLDS (Reinforcement Learning Datasets), an ecosystem for recording, replaying, manipulating, annotating and sharing data in the context of Sequential Decision Making (SDM) including Reinforcement Learning (RL), Learning from Demonstrations, Offline RL or Imitation Learning.

Decision Making Imitation Learning +2

Continuous Control with Action Quantization from Demonstrations

no code implementations19 Oct 2021 Robert Dadashi, Léonard Hussenot, Damien Vincent, Sertan Girgin, Anton Raichuk, Matthieu Geist, Olivier Pietquin

In this paper, we propose a novel method: Action Quantization from Demonstrations (AQuaDem) to learn a discretization of continuous action spaces by leveraging the priors of demonstrations.

Continuous Control Imitation Learning +1

Learning Natural Language Generation from Scratch

no code implementations20 Sep 2021 Alice Martin Donati, Guillaume Quispe, Charles Ollion, Sylvain Le Corff, Florian Strub, Olivier Pietquin

This paper introduces TRUncated ReinForcement Learning for Language (TrufLL), an original ap-proach to train conditional language models from scratch by only using reinforcement learning (RL).

Language Modelling reinforcement-learning +1

Generalization in Mean Field Games by Learning Master Policies

no code implementations20 Sep 2021 Sarah Perrin, Mathieu Laurière, Julien Pérolat, Romuald Élie, Matthieu Geist, Olivier Pietquin

Mean Field Games (MFGs) can potentially scale multi-agent systems to extremely large populations of agents.

What Matters for Adversarial Imitation Learning?

no code implementations NeurIPS 2021 Manu Orsini, Anton Raichuk, Léonard Hussenot, Damien Vincent, Robert Dadashi, Sertan Girgin, Matthieu Geist, Olivier Bachem, Olivier Pietquin, Marcin Andrychowicz

To tackle this issue, we implement more than 50 of these choices in a generic adversarial imitation learning framework and investigate their impacts in a large-scale study (>500k trained agents) with both synthetic and human-generated demonstrations.

Continuous Control Imitation Learning

Don't Do What Doesn't Matter: Intrinsic Motivation with Action Usefulness

1 code implementation20 May 2021 Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin

Sparse rewards are double-edged training signals in reinforcement learning: easy to design but hard to optimize.

reinforcement-learning

Mean Field Games Flock! The Reinforcement Learning Way

no code implementations17 May 2021 Sarah Perrin, Mathieu Laurière, Julien Pérolat, Matthieu Geist, Romuald Élie, Olivier Pietquin

We present a method enabling a large number of agents to learn how to flock, which is a natural behavior observed in large populations of animals.

reinforcement-learning

Offline Reinforcement Learning with Pseudometric Learning

no code implementations ICLR Workshop SSL-RL 2021 Robert Dadashi, Shideh Rezaeifar, Nino Vieillard, Léonard Hussenot, Olivier Pietquin, Matthieu Geist

In the presence of function approximation, and under the assumption of limited coverage of the state-action space of the environment, it is necessary to enforce the policy to visit state-action pairs close to the support of logged transitions.

reinforcement-learning

Scaling up Mean Field Games with Online Mirror Descent

1 code implementation28 Feb 2021 Julien Perolat, Sarah Perrin, Romuald Elie, Mathieu Laurière, Georgios Piliouras, Matthieu Geist, Karl Tuyls, Olivier Pietquin

We address scaling up equilibrium computation in Mean Field Games (MFGs) using Online Mirror Descent (OMD).

Adversarially Guided Actor-Critic

1 code implementation ICLR 2021 Yannis Flet-Berliac, Johan Ferret, Olivier Pietquin, Philippe Preux, Matthieu Geist

Despite definite success in deep reinforcement learning problems, actor-critic algorithms are still confronted with sample inefficiency in complex environments, particularly in tasks where efficient exploration is a bottleneck.

Efficient Exploration

Self-Imitation Advantage Learning

no code implementations22 Dec 2020 Johan Ferret, Olivier Pietquin, Matthieu Geist

Self-imitation learning is a Reinforcement Learning (RL) method that encourages actions whose returns were higher than expected, which helps in hard exploration and sparse reward problems.

Atari Games Imitation Learning +1

Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning

no code implementations NeurIPS 2020 Nino Vieillard, Tadashi Kozuno, Bruno Scherrer, Olivier Pietquin, Remi Munos, Matthieu Geist

Recent Reinforcement Learning (RL) algorithms making use of Kullback-Leibler (KL) regularization as a core component have shown outstanding performance.

reinforcement-learning

Learning from Heterogeneous EEG Signals with Differentiable Channel Reordering

no code implementations21 Oct 2020 Aaqib Saeed, David Grangier, Olivier Pietquin, Neil Zeghidour

We propose CHARM, a method for training a single neural network across inconsistent input channels.

EEG

A Machine of Few Words -- Interactive Speaker Recognition with Reinforcement Learning

no code implementations7 Aug 2020 Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin

To do so, we cast the speaker recognition task into a sequential decision-making problem that we solve with Reinforcement Learning.

Decision Making reinforcement-learning +2

The Monte Carlo Transformer: a stochastic self-attention model for sequence prediction

no code implementations15 Jul 2020 Alice Martin, Charles Ollion, Florian Strub, Sylvain Le Corff, Olivier Pietquin

This paper introduces the Sequential Monte Carlo Transformer, an original approach that naturally captures the observations distribution in a transformer architecture.

Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications

1 code implementation NeurIPS 2020 Sarah Perrin, Julien Perolat, Mathieu Laurière, Matthieu Geist, Romuald Elie, Olivier Pietquin

In this paper, we deepen the analysis of continuous time Fictitious Play learning algorithm to the consideration of various finite state Mean Field Game settings (finite horizon, $\gamma$-discounted), allowing in particular for the introduction of an additional common noise.

Show me the Way: Intrinsic Motivation from Demonstrations

no code implementations23 Jun 2020 Léonard Hussenot, Robert Dadashi, Matthieu Geist, Olivier Pietquin

Using an inverse RL approach, we show that complex exploration behaviors, reflecting different motivations, can be learnt and efficiently used by RL agents to solve tasks for which exhaustive exploration is prohibitive.

Decision Making Experimental Design

Reinforcement Learning

no code implementations29 May 2020 Olivier Buffet, Olivier Pietquin, Paul Weng

Reinforcement learning (RL) is a general framework for adaptive control, which has proven to be efficient in many domains, e. g., board games, video games or autonomous vehicles.

Autonomous Vehicles Board Games +2

Leverage the Average: an Analysis of KL Regularization in RL

no code implementations31 Mar 2020 Nino Vieillard, Tadashi Kozuno, Bruno Scherrer, Olivier Pietquin, Rémi Munos, Matthieu Geist

Recent Reinforcement Learning (RL) algorithms making use of Kullback-Leibler (KL) regularization as a core component have shown outstanding performance.

reinforcement-learning

Countering Language Drift with Seeded Iterated Learning

no code implementations ICML 2020 Yuchen Lu, Soumye Singhal, Florian Strub, Olivier Pietquin, Aaron Courville

At each time step, the teacher is created by copying the student agent, before being finetuned to maximize task completion.

Translation

HIGhER : Improving instruction following with Hindsight Generation for Experience Replay

no code implementations21 Oct 2019 Geoffrey Cideron, Mathieu Seurin, Florian Strub, Olivier Pietquin

Language creates a compact representation of the world and allows the description of unlimited situations and objectives through compositionality.

Language Acquisition

I'm sorry Dave, I'm afraid I can't do that, Deep Q-learning from forbidden action

no code implementations4 Oct 2019 Mathieu Seurin, Philippe Preux, Olivier Pietquin

Violating constraints thus results in rejected actions or entering in a safe mode driven by an external controller, making RL agents incapable of learning from their mistakes.

Industrial Robots Q-Learning

Self-Educated Language Agent with Hindsight Experience Replay for Instruction Following

no code implementations25 Sep 2019 Geoffrey Cideron, Mathieu Seurin, Florian Strub, Olivier Pietquin

Language creates a compact representation of the world and allows the description of unlimited situations and objectives through compositionality.

Language Acquisition

Self-Attentional Credit Assignment for Transfer in Reinforcement Learning

1 code implementation18 Jul 2019 Johan Ferret, Raphaël Marinier, Matthieu Geist, Olivier Pietquin

The ability to transfer knowledge to novel environments and tasks is a sensible desiderata for general learning agents.

reinforcement-learning Transfer Learning

On the Convergence of Model Free Learning in Mean Field Games

no code implementations4 Jul 2019 Romuald Elie, Julien Pérolat, Mathieu Laurière, Matthieu Geist, Olivier Pietquin

In order to design scalable algorithms for systems with a large population of interacting agents (e. g. swarms), this paper focuses on Mean Field MAS, where the number of agents is asymptotically infinite.

MULEX: Disentangling Exploitation from Exploration in Deep RL

no code implementations1 Jul 2019 Lucas Beyer, Damien Vincent, Olivier Teboul, Sylvain Gelly, Matthieu Geist, Olivier Pietquin

An agent learning through interactions should balance its action selection process between probing the environment to discover new rewards and using the information acquired in the past to adopt useful behaviour.

Deep Conservative Policy Iteration

no code implementations24 Jun 2019 Nino Vieillard, Olivier Pietquin, Matthieu Geist

Conservative Policy Iteration (CPI) is a founding algorithm of Approximate Dynamic Programming (ADP).

Atari Games reinforcement-learning

Foolproof Cooperative Learning

no code implementations24 Jun 2019 Alexis Jacq, Julien Perolat, Matthieu Geist, Olivier Pietquin

We prove that in repeated symmetric games, this algorithm is a learning equilibrium.

CopyCAT: Taking Control of Neural Policies with Constant Attacks

no code implementations29 May 2019 Léonard Hussenot, Matthieu Geist, Olivier Pietquin

In this setting, the adversary cannot directly modify the agent's state -- its representation of the environment -- but can only attack the agent's observation -- its perception of the environment.

Atari Games reinforcement-learning

Towards Consistent Performance on Atari using Expert Demonstrations

no code implementations ICLR 2019 Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Večerík, Matteo Hessel, Rémi Munos, Olivier Pietquin

Despite significant advances in the field of deep Reinforcement Learning (RL), today's algorithms still fail to learn human-level policies consistently over a set of diverse tasks such as Atari 2600 games.

Atari Games

A Theory of Regularized Markov Decision Processes

no code implementations31 Jan 2019 Matthieu Geist, Bruno Scherrer, Olivier Pietquin

Many recent successful (deep) reinforcement learning algorithms make use of regularization, generally based on entropy or Kullback-Leibler divergence.

Q-Learning reinforcement-learning

Visual Reasoning with Multi-hop Feature Modulation

1 code implementation ECCV 2018 Florian Strub, Mathieu Seurin, Ethan Perez, Harm de Vries, Jérémie Mary, Philippe Preux, Aaron Courville, Olivier Pietquin

Recent breakthroughs in computer vision and natural language processing have spurred interest in challenging multi-modal tasks such as visual question-answering and visual dialogue.

Question Answering Visual Dialog +2

Observe and Look Further: Achieving Consistent Performance on Atari

1 code implementation29 May 2018 Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Večerík, Matteo Hessel, Rémi Munos, Olivier Pietquin

Despite significant advances in the field of deep Reinforcement Learning (RL), today's algorithms still fail to learn human-level policies consistently over a set of diverse tasks such as Atari 2600 games.

Montezuma's Revenge

End-to-End Automatic Speech Translation of Audiobooks

1 code implementation12 Feb 2018 Alexandre Bérard, Laurent Besacier, Ali Can Kocabiyikoglu, Olivier Pietquin

We investigate end-to-end speech-to-text translation on a corpus of audiobooks specifically augmented for this task.

Speech-to-Text Translation Translation

LIG-CRIStAL System for the WMT17 Automatic Post-Editing Task

no code implementations17 Jul 2017 Alexandre Berard, Olivier Pietquin, Laurent Besacier

This paper presents the LIG-CRIStAL submission to the shared Automatic Post- Editing task of WMT 2017.

Automatic Post-Editing

Noisy Networks for Exploration

14 code implementations ICLR 2018 Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg

We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent's policy can be used to aid efficient exploration.

Atari Games Efficient Exploration +1

Observational Learning by Reinforcement Learning

no code implementations20 Jun 2017 Diana Borsa, Bilal Piot, Rémi Munos, Olivier Pietquin

Observational learning is a type of learning that occurs as a function of observing, retaining and possibly replicating or imitating the behaviour of another agent.

reinforcement-learning

Deep Q-learning from Demonstrations

5 code implementations12 Apr 2017 Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys

We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism.

Decision Making Imitation Learning +1

End-to-end optimization of goal-driven and visually grounded dialogue systems

2 code implementations15 Mar 2017 Florian Strub, Harm de Vries, Jeremie Mary, Bilal Piot, Aaron Courville, Olivier Pietquin

End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architectures for sequence-to-sequence learning.

Dialogue Management Visual Question Answering

Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation

1 code implementation6 Dec 2016 Alexandre Berard, Olivier Pietquin, Christophe Servan, Laurent Besacier

This paper proposes a first attempt to build an end-to-end speech-to-text translation system, which does not use source language transcription during learning or decoding.

Speech-to-Text Translation Translation

GuessWhat?! Visual object discovery through multi-modal dialogue

3 code implementations CVPR 2017 Harm de Vries, Florian Strub, Sarath Chandar, Olivier Pietquin, Hugo Larochelle, Aaron Courville

Our key contribution is the collection of a large-scale dataset consisting of 150K human-played games with a total of 800K visual question-answer pairs on 66K images.

Object Discovery

Is the Bellman residual a bad proxy?

no code implementations NeurIPS 2017 Matthieu Geist, Bilal Piot, Olivier Pietquin

This paper aims at theoretically and empirically comparing two standard optimization criteria for Reinforcement Learning: i) maximization of the mean value and ii) minimization of the Bellman residual.

reinforcement-learning

Difference of Convex Functions Programming Applied to Control with Expert Data

no code implementations3 Jun 2016 Bilal Piot, Matthieu Geist, Olivier Pietquin

This paper reports applications of Difference of Convex functions (DC) programming to Learning from Demonstrations (LfD) and Reinforcement Learning (RL) with expert data.

General Classification reinforcement-learning

MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP

1 code implementation LREC 2016 Alex B{\'e}rard, re, Christophe Servan, Olivier Pietquin, Laurent Besacier

We present MultiVec, a new toolkit for computing continuous representations for text at different granularity levels (word-level or sequences of words).

Document Classification General Classification +2

Difference of Convex Functions Programming for Reinforcement Learning

no code implementations NeurIPS 2014 Bilal Piot, Matthieu Geist, Olivier Pietquin

Controlling this residual allows controlling the distance to the optimal action-value function, and we show that minimizing an empirical norm of the OBR is consistant in the Vapnik sense.

Frame reinforcement-learning

Kalman Temporal Differences

no code implementations16 Jan 2014 Matthieu Geist, Olivier Pietquin

Because reinforcement learning suffers from a lack of scalability, online value (and Q-) function approximation has received increasing interest this last decade.

reinforcement-learning

Inverse Reinforcement Learning through Structured Classification

no code implementations NeurIPS 2012 Edouard Klein, Matthieu Geist, Bilal Piot, Olivier Pietquin

This paper adresses the inverse reinforcement learning (IRL) problem, that is inferring a reward for which a demonstrated expert behavior is optimal.

Classification General Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.