Search Results for author: Pierre Clavier

Found 3 papers, 0 papers with code

VITS : Variational Inference Thomson Sampling for contextual bandits

no code implementations19 Jul 2023 Pierre Clavier, Tom Huix, Alain Durmus

In this paper, we introduce and analyze a variant of the Thompson sampling (TS) algorithm for contextual bandits.

Multi-Armed Bandits Thompson Sampling +1

Towards Minimax Optimality of Model-based Robust Reinforcement Learning

no code implementations10 Feb 2023 Pierre Clavier, Erwan Le Pennec, Matthieu Geist

In this paper, we consider uncertainty sets defined with an $L_p$-ball (recovering the TV case), and study the sample complexity of \emph{any} planning algorithm (with high accuracy guarantee on the solution) applied to an empirical RMDP estimated using the generative model.

reinforcement-learning Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.