no code implementations • 19 Jul 2023 • Pierre Clavier, Tom Huix, Alain Durmus
In this paper, we introduce and analyze a variant of the Thompson sampling (TS) algorithm for contextual bandits.
no code implementations • 10 Feb 2023 • Pierre Clavier, Erwan Le Pennec, Matthieu Geist
In this paper, we consider uncertainty sets defined with an $L_p$-ball (recovering the TV case), and study the sample complexity of \emph{any} planning algorithm (with high accuracy guarantee on the solution) applied to an empirical RMDP estimated using the generative model.
no code implementations • 14 Jun 2022 • Pierre Clavier, Stéphanie Allassonière, Erwan Le Pennec
Robust Reinforcement Learning tries to make predictions more robust to changes in the dynamics or rewards of the system.
Distributional Reinforcement Learning reinforcement-learning +1