no code implementations • 16 Jun 2021 • Léonard Blier, Yann Ollivier
We introduce unbiased deep Q-learning and actor-critic algorithms that can handle such infinitely sparse rewards, and test them in toy environments.
no code implementations • 18 Jan 2021 • Léonard Blier, Corentin Tallec, Yann Ollivier
In reinforcement learning, temporal difference-based algorithms can be sample-inefficient: for instance, with sparse rewards, no learning occurs until a reward is observed.
1 code implementation • 28 Jan 2019 • Corentin Tallec, Léonard Blier, Yann Ollivier
Despite remarkable successes, Deep Reinforcement Learning (DRL) is not robust to hyperparameterization, implementation details, or small environment changes (Henderson et al. 2017, Zhang et al. 2018).
1 code implementation • 2 Oct 2018 • Léonard Blier, Pierre Wolinski, Yann Ollivier
Hyperparameter tuning is a bothersome step in the training of deep learning models.
no code implementations • 27 Sep 2018 • Léonard Blier, Pierre Wolinski, Yann Ollivier
Hyperparameter tuning is a bothersome step in the training of deep learning mod- els.
no code implementations • NeurIPS 2018 • Léonard Blier, Yann Ollivier
This might explain the relatively poor practical performance of variational methods in deep learning.