no code implementations • 2 May 2023 • Geoffrey Cideron, Baruch Tabanpour, Sebastian Curi, Sertan Girgin, Leonard Hussenot, Gabriel Dulac-Arnold, Matthieu Geist, Olivier Pietquin, Robert Dadashi
We consider the Imitation Learning (IL) setup where expert data are not collected on the actual deployment environment but on a different version.
no code implementations • 19 May 2022 • Mathieu Blondel, Felipe Llinares-López, Robert Dadashi, Léonard Hussenot, Matthieu Geist
To learn the parameters of the energy function, the solution to that optimization problem is typically fed into a loss function.
1 code implementation • 19 Oct 2021 • Robert Dadashi, Léonard Hussenot, Damien Vincent, Sertan Girgin, Anton Raichuk, Matthieu Geist, Olivier Pietquin
The proposed approach consists in learning a discretization of continuous action spaces from human demonstrations.
no code implementations • 11 Jun 2021 • Shideh Rezaeifar, Robert Dadashi, Nino Vieillard, Léonard Hussenot, Olivier Bachem, Olivier Pietquin, Matthieu Geist
This is the converse of exploration in RL, which favors such actions.
no code implementations • NeurIPS 2021 • Manu Orsini, Anton Raichuk, Léonard Hussenot, Damien Vincent, Robert Dadashi, Sertan Girgin, Matthieu Geist, Olivier Bachem, Olivier Pietquin, Marcin Andrychowicz
To tackle this issue, we implement more than 50 of these choices in a generic adversarial imitation learning framework and investigate their impacts in a large-scale study (>500k trained agents) with both synthetic and human-generated demonstrations.
no code implementations • 25 May 2021 • Leonard Hussenot, Marcin Andrychowicz, Damien Vincent, Robert Dadashi, Anton Raichuk, Lukasz Stafiniak, Sertan Girgin, Raphael Marinier, Nikola Momchev, Sabela Ramos, Manu Orsini, Olivier Bachem, Matthieu Geist, Olivier Pietquin
The vast literature in imitation learning mostly considers this reward function to be available for HP selection, but this is not a realistic setting.
no code implementations • ICLR Workshop SSL-RL 2021 • Robert Dadashi, Shideh Rezaeifar, Nino Vieillard, Léonard Hussenot, Olivier Pietquin, Matthieu Geist
In the presence of function approximation, and under the assumption of limited coverage of the state-action space of the environment, it is necessary to enforce the policy to visit state-action pairs close to the support of logged transitions.
no code implementations • 23 Jun 2020 • Léonard Hussenot, Robert Dadashi, Matthieu Geist, Olivier Pietquin
Using an inverse RL approach, we show that complex exploration behaviors, reflecting different motivations, can be learnt and efficiently used by RL agents to solve tasks for which exhaustive exploration is prohibitive.
1 code implementation • ICLR 2021 • Robert Dadashi, Léonard Hussenot, Matthieu Geist, Olivier Pietquin
Imitation Learning (IL) methods seek to match the behavior of an agent with that of an expert.
no code implementations • 3 Jun 2020 • Will Dabney, André Barreto, Mark Rowland, Robert Dadashi, John Quan, Marc G. Bellemare, David Silver
To test our hypothesis empirically, we augmented a standard deep RL agent with an auxiliary task of learning the value-improvement path.
3 code implementations • 1 Jun 2020 • Matthew W. Hoffman, Bobak Shahriari, John Aslanides, Gabriel Barth-Maron, Nikola Momchev, Danila Sinopalnikov, Piotr Stańczyk, Sabela Ramos, Anton Raichuk, Damien Vincent, Léonard Hussenot, Robert Dadashi, Gabriel Dulac-Arnold, Manu Orsini, Alexis Jacq, Johan Ferret, Nino Vieillard, Seyed Kamyar Seyed Ghasemipour, Sertan Girgin, Olivier Pietquin, Feryal Behbahani, Tamara Norman, Abbas Abdolmaleki, Albin Cassirer, Fan Yang, Kate Baumli, Sarah Henderson, Abe Friesen, Ruba Haroun, Alex Novikov, Sergio Gómez Colmenarejo, Serkan Cabi, Caglar Gulcehre, Tom Le Paine, Srivatsan Srinivasan, Andrew Cowie, Ziyu Wang, Bilal Piot, Nando de Freitas
These implementations serve both as a validation of our design decisions as well as an important contribution to reproducibility in RL research.
no code implementations • 21 Feb 2019 • Mark Rowland, Robert Dadashi, Saurabh Kumar, Rémi Munos, Marc G. Bellemare, Will Dabney
We present a unifying framework for designing and analysing distributional reinforcement learning (DRL) algorithms in terms of recursively estimating statistics of the return distribution.
Distributional Reinforcement Learning
reinforcement-learning
+1
no code implementations • 31 Jan 2019 • Robert Dadashi, Adrien Ali Taïga, Nicolas Le Roux, Dale Schuurmans, Marc G. Bellemare
We establish geometric and topological properties of the space of value functions in finite state-action Markov decision processes.
no code implementations • NeurIPS 2019 • Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle
We leverage this perspective to provide formal evidence regarding the usefulness of value functions as auxiliary tasks.
no code implementations • 12 Nov 2018 • Sophia Collet, Robert Dadashi, Zahi N. Karam, Chang Liu, Parinaz Sobhani, Yevgeniy Vahlis, Ji Chao Zhang
In this work, two approaches for private model aggregation are proposed that enable the transfer of knowledge from existing models trained on other companies' datasets to a new company with limited labeled data while protecting each client company's underlying individual sensitive information.