no code implementations • 6 Sep 2023 • Lorenzo Croissant, Marc Abeille, Bruno Bouchard
In addition, we consider a generic reward function and model the state dynamics according to a jump process with an arbitrary transition kernel on $\mathbb{R}^d$.
2 code implementations • 6 Jan 2022 • Louis Faury, Marc Abeille, Kwang-Sung Jun, Clément Calauzènes
Logistic Bandits have recently undergone careful scrutiny by virtue of their combined theoretical and practical relevance.
no code implementations • 9 Mar 2021 • Louis Faury, Yoan Russac, Marc Abeille, Clément Calauzènes
Generalized Linear Bandits (GLBs) are powerful extensions to the Linear Bandit (LB) setting, broadening the benefits of reward parametrization beyond linearity.
no code implementations • 23 Oct 2020 • Marc Abeille, Louis Faury, Clément Calauzènes
It was shown by Faury et al. (2020) that the learning-theoretic difficulties of Logistic Bandits can be embodied by a large (sometimes prohibitively) problem-dependent constant $\kappa$, characterizing the magnitude of the reward's non-linearity.
no code implementations • ICML 2020 • Lorenzo Croissant, Marc Abeille, Clément Calauzènes
In display advertising, a small group of sellers and bidders face each other in up to 10 12 auctions a day.
no code implementations • ICML 2020 • Marc Abeille, Alessandro Lazaric
We study the exploration-exploitation dilemma in the linear quadratic regulator (LQR) setting.
no code implementations • ICML 2020 • Louis Faury, Marc Abeille, Clément Calauzènes, Olivier Fercoq
For logistic bandits, the frequentist regret guarantees of existing algorithms are $\tilde{\mathcal{O}}(\kappa \sqrt{T})$, where $\kappa$ is a problem-dependent constant.
no code implementations • 12 Oct 2019 • Young Hun Jung, Marc Abeille, Ambuj Tewari
Restless bandit problems assume time-varying reward distributions of the arms, which adds flexibility to the model but makes the analysis more challenging.
no code implementations • ICML 2018 • Marc Abeille, Alessandro Lazaric
Thompson sampling (TS) is an effective approach to trade off exploration and exploration in reinforcement learning.
no code implementations • 27 Mar 2017 • Marc Abeille, Alessandro Lazaric
Despite the empirical and theoretical success in a wide range of problems from multi-armed bandit to linear bandit, we show that when studying the frequentist regret TS in control problems, we need to trade-off the frequency of sampling optimistic parameters and the frequency of switches in the control policy.
no code implementations • 20 Nov 2016 • Marc Abeille, Alessandro Lazaric
We derive an alternative proof for the regret of Thompson sampling (\ts) in the stochastic linear bandit setting.