no code implementations • 20 Jul 2023 • Raphael Boige, Yannis Flet-Berliac, Arthur Flajolet, Guillaume Richard, Thomas Pierrot
Self-supervised learning has brought about a revolutionary paradigm shift in various computing domains, including NLP, vision, and biology.
no code implementations • 26 Jan 2023 • Kefan Dong, Yannis Flet-Berliac, Allen Nie, Emma Brunskill
We present a model-based offline reinforcement learning policy performance lower bound that explicitly captures dynamics model misspecification and distribution mismatch and we propose an empirical algorithm for optimal offline policy selection.
no code implementations • 16 Oct 2022 • Allen Nie, Yannis Flet-Berliac, Deon R. Jordan, William Steenbergen, Emma Brunskill
Inspired by statistical model selection methods for supervised learning, we introduce a task- and method-agnostic pipeline for automatically training, comparing, selecting, and deploying the best policy when the provided dataset is limited in size.
1 code implementation • 1 Jul 2022 • Yao Liu, Yannis Flet-Berliac, Emma Brunskill
Offline policy optimization could have a large impact on many real-world decision-making problems, as online learning may be infeasible in many applications.
no code implementations • 20 Apr 2022 • Yannis Flet-Berliac, Debabrota Basu
In SAAC, the adversary aims to break the safety constraint while the RL agent aims to maximize the constrained value function given the adversary's policy.
1 code implementation • ICLR 2021 • Yannis Flet-Berliac, Johan Ferret, Olivier Pietquin, Philippe Preux, Matthieu Geist
Despite definite success in deep reinforcement learning problems, actor-critic algorithms are still confronted with sample inefficiency in complex environments, particularly in tasks where efficient exploration is a bottleneck.
no code implementations • ICLR 2021 • Yannis Flet-Berliac, Reda Ouhamma, Odalric-Ambrym Maillard, Philippe Preux
We prove the theoretical consistency of the new gradient estimator and observe dramatic empirical improvement across a variety of continuous control tasks and algorithms.
no code implementations • 26 Sep 2019 • Yannis Flet-Berliac, Philippe Preux
In this paper: (a) We introduce and define MERL, the multi-head reinforcement learning framework we use throughout this work.
no code implementations • 25 Sep 2019 • Yannis Flet-Berliac, Philippe Preux
In this work, Vex is used to evaluate the impact each transition will have on learning: this criterion refines sampling and improves the policy gradient algorithm.
no code implementations • 8 Apr 2019 • Yannis Flet-Berliac, Philippe Preux
In this work, we use this metric to select samples that are useful to learn from, and we demonstrate that this selection can significantly improve the performance of policy gradient methods.