Search Results for author: Léonard Blier

Found 7 papers, 2 papers with code

Magistral

no code implementations12 Jun 2025 Mistral-AI, :, Abhinav Rastogi, Albert Q. Jiang, Andy Lo, Gabrielle Berrada, Guillaume Lample, Jason Rute, Joep Barmentlo, Karmesh Yadav, Kartik Khandelwal, Khyathi Raghavi Chandu, Léonard Blier, Lucile Saulnier, Matthieu Dinot, Maxime Darrin, Neha Gupta, Roman Soletskyi, Sagar Vaze, Teven Le Scao, Yihan Wang, Adam Yang, Alexander H. Liu, Alexandre Sablayrolles, Amélie Héliou, Amélie Martin, Andy Ehrenberg, Anmol Agarwal, Antoine Roux, Arthur Darcet, Arthur Mensch, Baptiste Bout, Baptiste Rozière, Baudouin De Monicault, Chris Bamford, Christian Wallenwein, Christophe Renaudin, Clémence Lanfranchi, Darius Dabert, Devon Mizelle, Diego de Las Casas, Elliot Chane-Sane, Emilien Fugier, Emma Bou Hanna, Gauthier Delerce, Gauthier Guinet, Georgii Novikov, Guillaume Martin, Himanshu Jaju, Jan Ludziejewski, Jean-Hadrien Chabran, Jean-Malo Delignon, Joachim Studnia, Jonas Amar, Josselin Somerville Roberts, Julien Denize, Karan Saxena, Kush Jain, Lingxiao Zhao, Louis Martin, Luyu Gao, Lélio Renard Lavaud, Marie Pellat, Mathilde Guillaumin, Mathis Felardos, Maximilian Augustin, Mickaël Seznec, Nikhil Raghuraman, Olivier Duchenne, Patricia Wang, Patrick von Platen, Patryk Saffer, Paul Jacob, Paul Wambergue, Paula Kurylowicz, Pavankumar Reddy Muddireddy, Philomène Chagniot, Pierre Stock, Pravesh Agrawal, Romain Sauvestre, Rémi Delacourt, Sanchit Gandhi, Sandeep Subramanian, Shashwat Dalal, Siddharth Gandhi, Soham Ghosh, Srijan Mishra, Sumukh Aithal, Szymon Antoniak, Thibault Schueller, Thibaut Lavril, Thomas Robert, Thomas Wang, Timothée Lacroix, Valeriia Nemychnikova, Victor Paltz, Virgile Richard, Wen-Ding Li, William Marshall, Xuanyu Zhang, Yunhao Tang

We introduce Magistral, Mistral's first reasoning model and our own scalable reinforcement learning (RL) pipeline.

Instruction Following Reinforcement Learning (RL)

Unbiased Methods for Multi-Goal Reinforcement Learning

no code implementations16 Jun 2021 Léonard Blier, Yann Ollivier

We introduce unbiased deep Q-learning and actor-critic algorithms that can handle such infinitely sparse rewards, and test them in toy environments.

Multi-Goal Reinforcement Learning Q-Learning +3

Learning Successor States and Goal-Dependent Values: A Mathematical Viewpoint

no code implementations18 Jan 2021 Léonard Blier, Corentin Tallec, Yann Ollivier

In reinforcement learning, temporal difference-based algorithms can be sample-inefficient: for instance, with sparse rewards, no learning occurs until a reward is observed.

Making Deep Q-learning methods robust to time discretization

1 code implementation28 Jan 2019 Corentin Tallec, Léonard Blier, Yann Ollivier

Despite remarkable successes, Deep Reinforcement Learning (DRL) is not robust to hyperparameterization, implementation details, or small environment changes (Henderson et al. 2017, Zhang et al. 2018).

Deep Reinforcement Learning Q-Learning

Learning with Random Learning Rates

1 code implementation2 Oct 2018 Léonard Blier, Pierre Wolinski, Yann Ollivier

Hyperparameter tuning is a bothersome step in the training of deep learning models.

Learning with Random Learning Rates.

no code implementations27 Sep 2018 Léonard Blier, Pierre Wolinski, Yann Ollivier

Hyperparameter tuning is a bothersome step in the training of deep learning mod- els.

The Description Length of Deep Learning Models

no code implementations NeurIPS 2018 Léonard Blier, Yann Ollivier

This might explain the relatively poor practical performance of variational methods in deep learning.

Deep Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.