1 code implementation • NAACL 2022 • Alice Martin, Guillaume Quispe, Charles Ollion, Sylvain Le Corff, Florian Strub, Olivier Pietquin
To our knowledge, it is the first approach that successfully learns a language generation policy without pre-training, using only reinforcement learning.
no code implementations • 26 Jun 2023 • Giorgia Ramponi, Pavel Kolev, Olivier Pietquin, Niao He, Mathieu Laurière, Matthieu Geist
We explore the problem of imitation learning (IL) in the context of mean-field games (MFGs), where the goal is to imitate the behavior of a population of agents following a Nash equilibrium policy according to some unknown payoff function.
no code implementations • 31 May 2023 • Paul Roit, Johan Ferret, Lior Shani, Roee Aharoni, Geoffrey Cideron, Robert Dadashi, Matthieu Geist, Sertan Girgin, Léonard Hussenot, Orgad Keller, Nikola Momchev, Sabela Ramos, Piotr Stanczyk, Nino Vieillard, Olivier Bachem, Gal Elidan, Avinatan Hassidim, Olivier Pietquin, Idan Szpektor
Despite the seeming success of contemporary grounded text generation systems, they often tend to generate factually inconsistent text with respect to their input.
Abstractive Text Summarization
Natural Language Inference
+2
1 code implementation • 22 May 2023 • Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári, Wataru Kumagai, Yutaka Matsuo
Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms.
no code implementations • 2 May 2023 • Geoffrey Cideron, Baruch Tabanpour, Sebastian Curi, Sertan Girgin, Leonard Hussenot, Gabriel Dulac-Arnold, Matthieu Geist, Olivier Pietquin, Robert Dadashi
We consider the Imitation Learning (IL) setup where expert data are not collected on the actual deployment environment but on a different version.
no code implementations • 30 Jan 2023 • Chris Donahue, Antoine Caillon, Adam Roberts, Ethan Manilow, Philippe Esling, Andrea Agostinelli, Mauro Verzetti, Ian Simon, Olivier Pietquin, Neil Zeghidour, Jesse Engel
We present SingSong, a system that generates instrumental music to accompany input vocals, potentially offering musicians and non-musicians alike an intuitive new way to create music featuring their own voice.
no code implementations • 7 Nov 2022 • Alexis Jacq, Manu Orsini, Gabriel Dulac-Arnold, Olivier Pietquin, Matthieu Geist, Olivier Bachem
Are the quantity and quality of data truly transformative to the performance of a general controller?
1 code implementation • 30 Sep 2022 • Mathieu Rita, Corentin Tallec, Paul Michel, Jean-bastien Grill, Olivier Pietquin, Emmanuel Dupoux, Florian Strub
Lewis signaling games are a class of simple communication games for simulating the emergence of language.
no code implementations • 14 Sep 2022 • Geoffrey Cideron, Sertan Girgin, Anton Raichuk, Olivier Pietquin, Olivier Bachem, Léonard Hussenot
We propose a simple data augmentation technique based on round-trip translations and show in extensive experiments that the resulting vec2text model surprisingly leads to vector spaces that fulfill our four desired properties and that this model strongly outperforms both standard and denoising auto-encoders.
4 code implementations • 7 Sep 2022 • Zalán Borsos, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Dominik Roblek, Olivier Teboul, David Grangier, Marco Tagliasacchi, Neil Zeghidour
We introduce AudioLM, a framework for high-quality audio generation with long-term consistency.
no code implementations • 22 Aug 2022 • Paul Muller, Romuald Elie, Mark Rowland, Mathieu Lauriere, Julien Perolat, Sarah Perrin, Matthieu Geist, Georgios Piliouras, Olivier Pietquin, Karl Tuyls
The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts.
no code implementations • 27 May 2022 • Tadashi Kozuno, Wenhao Yang, Nino Vieillard, Toshinori Kitamura, Yunhao Tang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Michal Valko, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári
In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model.
no code implementations • 25 May 2022 • Mathieu Laurière, Sarah Perrin, Matthieu Geist, Olivier Pietquin
Non-cooperative and cooperative games with a very large number of players have many applications but remain generally intractable when the number of players increases.
no code implementations • 22 Mar 2022 • Mathieu Laurière, Sarah Perrin, Sertan Girgin, Paul Muller, Ayush Jain, Theophile Cabannes, Georgios Piliouras, Julien Pérolat, Romuald Élie, Olivier Pietquin, Matthieu Geist
One limiting factor to further scale up using RL is that existing algorithms to solve MFGs require the mixing of approximated quantities such as strategies or $q$-values.
no code implementations • 16 Mar 2022 • Alexis Jacq, Johan Ferret, Olivier Pietquin, Matthieu Geist
We deem those states and corresponding actions important since they explain the difference in performance between the default and the new, lazy policy.
1 code implementation • 4 Nov 2021 • Sabela Ramos, Sertan Girgin, Léonard Hussenot, Damien Vincent, Hanna Yakubovich, Daniel Toyama, Anita Gergely, Piotr Stanczyk, Raphael Marinier, Jeremiah Harmsen, Olivier Pietquin, Nikola Momchev
We introduce RLDS (Reinforcement Learning Datasets), an ecosystem for recording, replaying, manipulating, annotating and sharing data in the context of Sequential Decision Making (SDM) including Reinforcement Learning (RL), Learning from Demonstrations, Offline RL or Imitation Learning.
1 code implementation • 19 Oct 2021 • Robert Dadashi, Léonard Hussenot, Damien Vincent, Sertan Girgin, Anton Raichuk, Matthieu Geist, Olivier Pietquin
The proposed approach consists in learning a discretization of continuous action spaces from human demonstrations.
no code implementations • 20 Sep 2021 • Sarah Perrin, Mathieu Laurière, Julien Pérolat, Romuald Élie, Matthieu Geist, Olivier Pietquin
Mean Field Games (MFGs) can potentially scale multi-agent systems to extremely large populations of agents.
no code implementations • 20 Sep 2021 • Alice Martin Donati, Guillaume Quispe, Charles Ollion, Sylvain Le Corff, Florian Strub, Olivier Pietquin
This paper introduces TRUncated ReinForcement Learning for Language (TrufLL), an original ap-proach to train conditional language models from scratch by only using reinforcement learning (RL).
no code implementations • 11 Jun 2021 • Shideh Rezaeifar, Robert Dadashi, Nino Vieillard, Léonard Hussenot, Olivier Bachem, Olivier Pietquin, Matthieu Geist
This is the converse of exploration in RL, which favors such actions.
no code implementations • NeurIPS 2021 • Nathan Grinsztajn, Johan Ferret, Olivier Pietquin, Philippe Preux, Matthieu Geist
We propose to learn to distinguish reversible from irreversible actions for better informed decision-making in Reinforcement Learning (RL).
no code implementations • 7 Jun 2021 • Matthieu Geist, Julien Pérolat, Mathieu Laurière, Romuald Elie, Sarah Perrin, Olivier Bachem, Rémi Munos, Olivier Pietquin
Mean-field Games (MFGs) are a continuous approximation of many-agent RL.
1 code implementation • NeurIPS 2021 • Manu Orsini, Anton Raichuk, Léonard Hussenot, Damien Vincent, Robert Dadashi, Sertan Girgin, Matthieu Geist, Olivier Bachem, Olivier Pietquin, Marcin Andrychowicz
To tackle this issue, we implement more than 50 of these choices in a generic adversarial imitation learning framework and investigate their impacts in a large-scale study (>500k trained agents) with both synthetic and human-generated demonstrations.
no code implementations • 25 May 2021 • Leonard Hussenot, Marcin Andrychowicz, Damien Vincent, Robert Dadashi, Anton Raichuk, Lukasz Stafiniak, Sertan Girgin, Raphael Marinier, Nikola Momchev, Sabela Ramos, Manu Orsini, Olivier Bachem, Matthieu Geist, Olivier Pietquin
The vast literature in imitation learning mostly considers this reward function to be available for HP selection, but this is not a realistic setting.
1 code implementation • 20 May 2021 • Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin
Sparse rewards are double-edged training signals in reinforcement learning: easy to design but hard to optimize.
no code implementations • 17 May 2021 • Sarah Perrin, Mathieu Laurière, Julien Pérolat, Matthieu Geist, Romuald Élie, Olivier Pietquin
We present a method enabling a large number of agents to learn how to flock, which is a natural behavior observed in large populations of animals.
no code implementations • ICLR Workshop SSL-RL 2021 • Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin
We evaluate RAM on the procedurally-generated environment MiniGrid, against state-of-the-art methods.
no code implementations • ICLR Workshop SSL-RL 2021 • Robert Dadashi, Shideh Rezaeifar, Nino Vieillard, Léonard Hussenot, Olivier Pietquin, Matthieu Geist
In the presence of function approximation, and under the assumption of limited coverage of the state-action space of the environment, it is necessary to enforce the policy to visit state-action pairs close to the support of logged transitions.
1 code implementation • 28 Feb 2021 • Julien Perolat, Sarah Perrin, Romuald Elie, Mathieu Laurière, Georgios Piliouras, Matthieu Geist, Karl Tuyls, Olivier Pietquin
We address scaling up equilibrium computation in Mean Field Games (MFGs) using Online Mirror Descent (OMD).
1 code implementation • ICLR 2021 • Yannis Flet-Berliac, Johan Ferret, Olivier Pietquin, Philippe Preux, Matthieu Geist
Despite definite success in deep reinforcement learning problems, actor-critic algorithms are still confronted with sample inefficiency in complex environments, particularly in tasks where efficient exploration is a bottleneck.
no code implementations • ICLR 2021 • Marcin Andrychowicz, Anton Raichuk, Piotr Stańczyk, Manu Orsini, Sertan Girgin, Raphaël Marinier, Leonard Hussenot, Matthieu Geist, Olivier Pietquin, Marcin Michalski, Sylvain Gelly, Olivier Bachem
In recent years, reinforcement learning (RL) has been successfully applied to many different continuous control tasks.
no code implementations • 22 Dec 2020 • Johan Ferret, Olivier Pietquin, Matthieu Geist
Self-imitation learning is a Reinforcement Learning (RL) method that encourages actions whose returns were higher than expected, which helps in hard exploration and sparse reward problems.
no code implementations • NeurIPS 2020 • Nino Vieillard, Tadashi Kozuno, Bruno Scherrer, Olivier Pietquin, Remi Munos, Matthieu Geist
Recent Reinforcement Learning (RL) algorithms making use of Kullback-Leibler (KL) regularization as a core component have shown outstanding performance.
no code implementations • 21 Oct 2020 • Aaqib Saeed, David Grangier, Olivier Pietquin, Neil Zeghidour
We propose CHARM, a method for training a single neural network across inconsistent input channels.
no code implementations • EMNLP 2020 • Yuchen Lu, Soumye Singhal, Florian Strub, Olivier Pietquin, Aaron Courville
Language drift has been one of the major obstacles to train language models through interaction.
no code implementations • 7 Aug 2020 • Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin
To do so, we cast the speaker recognition task into a sequential decision-making problem that we solve with Reinforcement Learning.
6 code implementations • NeurIPS 2020 • Nino Vieillard, Olivier Pietquin, Matthieu Geist
Bootstrapping is a core mechanism in Reinforcement Learning (RL).
Ranked #8 on
Atari Games
on Atari-57
no code implementations • 15 Jul 2020 • Alice Martin, Charles Ollion, Florian Strub, Sylvain Le Corff, Olivier Pietquin
This paper introduces the Sequential Monte Carlo Transformer, an original approach that naturally captures the observations distribution in a transformer architecture.
1 code implementation • NeurIPS 2020 • Sarah Perrin, Julien Perolat, Mathieu Laurière, Matthieu Geist, Romuald Elie, Olivier Pietquin
In this paper, we deepen the analysis of continuous time Fictitious Play learning algorithm to the consideration of various finite state Mean Field Game settings (finite horizon, $\gamma$-discounted), allowing in particular for the introduction of an additional common noise.
no code implementations • 23 Jun 2020 • Léonard Hussenot, Robert Dadashi, Matthieu Geist, Olivier Pietquin
Using an inverse RL approach, we show that complex exploration behaviors, reflecting different motivations, can be learnt and efficiently used by RL agents to solve tasks for which exhaustive exploration is prohibitive.
1 code implementation • 10 Jun 2020 • Marcin Andrychowicz, Anton Raichuk, Piotr Stańczyk, Manu Orsini, Sertan Girgin, Raphael Marinier, Léonard Hussenot, Matthieu Geist, Olivier Pietquin, Marcin Michalski, Sylvain Gelly, Olivier Bachem
In recent years, on-policy reinforcement learning (RL) has been successfully applied to many different continuous control tasks.
2 code implementations • ICLR 2021 • Robert Dadashi, Léonard Hussenot, Matthieu Geist, Olivier Pietquin
Imitation Learning (IL) methods seek to match the behavior of an agent with that of an expert.
3 code implementations • 1 Jun 2020 • Matthew W. Hoffman, Bobak Shahriari, John Aslanides, Gabriel Barth-Maron, Nikola Momchev, Danila Sinopalnikov, Piotr Stańczyk, Sabela Ramos, Anton Raichuk, Damien Vincent, Léonard Hussenot, Robert Dadashi, Gabriel Dulac-Arnold, Manu Orsini, Alexis Jacq, Johan Ferret, Nino Vieillard, Seyed Kamyar Seyed Ghasemipour, Sertan Girgin, Olivier Pietquin, Feryal Behbahani, Tamara Norman, Abbas Abdolmaleki, Albin Cassirer, Fan Yang, Kate Baumli, Sarah Henderson, Abe Friesen, Ruba Haroun, Alex Novikov, Sergio Gómez Colmenarejo, Serkan Cabi, Caglar Gulcehre, Tom Le Paine, Srivatsan Srinivasan, Andrew Cowie, Ziyu Wang, Bilal Piot, Nando de Freitas
These implementations serve both as a validation of our design decisions as well as an important contribution to reproducibility in RL research.
no code implementations • 29 May 2020 • Olivier Buffet, Olivier Pietquin, Paul Weng
Reinforcement learning (RL) is a general framework for adaptive control, which has proven to be efficient in many domains, e. g., board games, video games or autonomous vehicles.
no code implementations • 31 Mar 2020 • Nino Vieillard, Tadashi Kozuno, Bruno Scherrer, Olivier Pietquin, Rémi Munos, Matthieu Geist
Recent Reinforcement Learning (RL) algorithms making use of Kullback-Leibler (KL) regularization as a core component have shown outstanding performance.
no code implementations • ICML 2020 • Yuchen Lu, Soumye Singhal, Florian Strub, Olivier Pietquin, Aaron Courville
At each time step, the teacher is created by copying the student agent, before being finetuned to maximize task completion.
no code implementations • 21 Oct 2019 • Geoffrey Cideron, Mathieu Seurin, Florian Strub, Olivier Pietquin
Language creates a compact representation of the world and allows the description of unlimited situations and objectives through compositionality.
no code implementations • 21 Oct 2019 • Nino Vieillard, Bruno Scherrer, Olivier Pietquin, Matthieu Geist
We adapt the optimization's concept of momentum to reinforcement learning.
no code implementations • 18 Oct 2019 • Nino Vieillard, Olivier Pietquin, Matthieu Geist
In this paper, we draw connections between DP and (constrained) convex optimization.
no code implementations • 4 Oct 2019 • Mathieu Seurin, Philippe Preux, Olivier Pietquin
Violating constraints thus results in rejected actions or entering in a safe mode driven by an external controller, making RL agents incapable of learning from their mistakes.
no code implementations • 25 Sep 2019 • Geoffrey Cideron, Mathieu Seurin, Florian Strub, Olivier Pietquin
Language creates a compact representation of the world and allows the description of unlimited situations and objectives through compositionality.
1 code implementation • 18 Jul 2019 • Johan Ferret, Raphaël Marinier, Matthieu Geist, Olivier Pietquin
The ability to transfer knowledge to novel environments and tasks is a sensible desiderata for general learning agents.
no code implementations • 4 Jul 2019 • Romuald Elie, Julien Pérolat, Mathieu Laurière, Matthieu Geist, Olivier Pietquin
In order to design scalable algorithms for systems with a large population of interacting agents (e. g. swarms), this paper focuses on Mean Field MAS, where the number of agents is asymptotically infinite.
no code implementations • 1 Jul 2019 • Lucas Beyer, Damien Vincent, Olivier Teboul, Sylvain Gelly, Matthieu Geist, Olivier Pietquin
An agent learning through interactions should balance its action selection process between probing the environment to discover new rewards and using the information acquired in the past to adopt useful behaviour.
no code implementations • 24 Jun 2019 • Alexis Jacq, Julien Perolat, Matthieu Geist, Olivier Pietquin
We prove that in repeated symmetric games, this algorithm is a learning equilibrium.
no code implementations • 24 Jun 2019 • Nino Vieillard, Olivier Pietquin, Matthieu Geist
Conservative Policy Iteration (CPI) is a founding algorithm of Approximate Dynamic Programming (ADP).
no code implementations • 29 May 2019 • Léonard Hussenot, Matthieu Geist, Olivier Pietquin
In this setting, the adversary cannot directly modify the agent's state -- its representation of the environment -- but can only attack the agent's observation -- its perception of the environment.
no code implementations • ICLR 2019 • Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Večerík, Matteo Hessel, Rémi Munos, Olivier Pietquin
Despite significant advances in the field of deep Reinforcement Learning (RL), today's algorithms still fail to learn human-level policies consistently over a set of diverse tasks such as Atari 2600 games.
1 code implementation • NeurIPS 2019 • Nicolas Carrara, Edouard Leurent, Romain Laroche, Tanguy Urvoy, Odalric-Ambrym Maillard, Olivier Pietquin
A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints.
no code implementations • 31 Jan 2019 • Matthieu Geist, Bruno Scherrer, Olivier Pietquin
Many recent successful (deep) reinforcement learning algorithms make use of regularization, generally based on entropy or Kullback-Leibler divergence.
no code implementations • 20 Sep 2018 • Julien Perolat, Mateusz Malinowski, Bilal Piot, Olivier Pietquin
We study the problem of learning classifiers robust to universal adversarial perturbations.
1 code implementation • ECCV 2018 • Florian Strub, Mathieu Seurin, Ethan Perez, Harm de Vries, Jérémie Mary, Philippe Preux, Aaron Courville, Olivier Pietquin
Recent breakthroughs in computer vision and natural language processing have spurred interest in challenging multi-modal tasks such as visual question-answering and visual dialogue.
no code implementations • 29 May 2018 • Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Večerík, Matteo Hessel, Rémi Munos, Olivier Pietquin
Despite significant advances in the field of deep Reinforcement Learning (RL), today's algorithms still fail to learn human-level policies consistently over a set of diverse tasks such as Atari 2600 games.
1 code implementation • 12 Feb 2018 • Alexandre Bérard, Laurent Besacier, Ali Can Kocabiyikoglu, Olivier Pietquin
We investigate end-to-end speech-to-text translation on a corpus of audiobooks specifically augmented for this task.
4 code implementations • 27 Jul 2017 • Mel Vecerik, Todd Hester, Jonathan Scholz, Fumin Wang, Olivier Pietquin, Bilal Piot, Nicolas Heess, Thomas Rothörl, Thomas Lampe, Martin Riedmiller
We propose a general and model-free approach for Reinforcement Learning (RL) on real robotics with sparse rewards.
no code implementations • 17 Jul 2017 • Alexandre Berard, Olivier Pietquin, Laurent Besacier
This paper presents the LIG-CRIStAL submission to the shared Automatic Post- Editing task of WMT 2017.
3 code implementations • NeurIPS 2017 • Harm de Vries, Florian Strub, Jérémie Mary, Hugo Larochelle, Olivier Pietquin, Aaron Courville
It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected.
15 code implementations • ICLR 2018 • Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg
We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent's policy can be used to aid efficient exploration.
Ranked #1 on
Atari Games
on Atari 2600 Surround
no code implementations • 20 Jun 2017 • Diana Borsa, Bilal Piot, Rémi Munos, Olivier Pietquin
Observational learning is a type of learning that occurs as a function of observing, retaining and possibly replicating or imitating the behaviour of another agent.
5 code implementations • 12 Apr 2017 • Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys
We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism.
2 code implementations • 15 Mar 2017 • Florian Strub, Harm de Vries, Jeremie Mary, Bilal Piot, Aaron Courville, Olivier Pietquin
End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architectures for sequence-to-sequence learning.
1 code implementation • 6 Dec 2016 • Alexandre Berard, Olivier Pietquin, Christophe Servan, Laurent Besacier
This paper proposes a first attempt to build an end-to-end speech-to-text translation system, which does not use source language transcription during learning or decoding.
4 code implementations • CVPR 2017 • Harm de Vries, Florian Strub, Sarath Chandar, Olivier Pietquin, Hugo Larochelle, Aaron Courville
Our key contribution is the collection of a large-scale dataset consisting of 150K human-played games with a total of 800K visual question-answer pairs on 66K images.
no code implementations • NeurIPS 2017 • Matthieu Geist, Bilal Piot, Olivier Pietquin
This paper aims at theoretically and empirically comparing two standard optimization criteria for Reinforcement Learning: i) maximization of the mean value and ii) minimization of the Bellman residual.
no code implementations • 3 Jun 2016 • Bilal Piot, Matthieu Geist, Olivier Pietquin
This paper reports applications of Difference of Convex functions (DC) programming to Learning from Demonstrations (LfD) and Reinforcement Learning (RL) with expert data.
1 code implementation • LREC 2016 • Alex B{\'e}rard, re, Christophe Servan, Olivier Pietquin, Laurent Besacier
We present MultiVec, a new toolkit for computing continuous representations for text at different granularity levels (word-level or sequences of words).
no code implementations • NeurIPS 2014 • Bilal Piot, Matthieu Geist, Olivier Pietquin
Controlling this residual allows controlling the distance to the optimal action-value function, and we show that minimizing an empirical norm of the OBR is consistant in the Vapnik sense.
no code implementations • LREC 2014 • Layla El Asri, R{\'e}mi Lemonnier, Romain Laroche, Olivier Pietquin, Hatim Khouzaimi
Appointment scheduling is a hybrid task halfway between slot-filling and negotiation.
no code implementations • LREC 2014 • Layla El Asri, Romain Laroche, Olivier Pietquin
NASTIA is a reinforcement learning-based system.
no code implementations • 16 Jan 2014 • Matthieu Geist, Olivier Pietquin
Because reinforcement learning suffers from a lack of scalability, online value (and Q-) function approximation has received increasing interest this last decade.
no code implementations • NeurIPS 2012 • Edouard Klein, Matthieu Geist, Bilal Piot, Olivier Pietquin
This paper adresses the inverse reinforcement learning (IRL) problem, that is inferring a reward for which a demonstrated expert behavior is optimal.