Search Results for author: Marc Abeille

Found 11 papers, 1 papers with code

Near-continuous time Reinforcement Learning for continuous state-action spaces

no code implementations6 Sep 2023 Lorenzo Croissant, Marc Abeille, Bruno Bouchard

In addition, we consider a generic reward function and model the state dynamics according to a jump process with an arbitrary transition kernel on $\mathbb{R}^d$.

reinforcement-learning

Jointly Efficient and Optimal Algorithms for Logistic Bandits

2 code implementations6 Jan 2022 Louis Faury, Marc Abeille, Kwang-Sung Jun, Clément Calauzènes

Logistic Bandits have recently undergone careful scrutiny by virtue of their combined theoretical and practical relevance.

Computational Efficiency

Regret Bounds for Generalized Linear Bandits under Parameter Drift

no code implementations9 Mar 2021 Louis Faury, Yoan Russac, Marc Abeille, Clément Calauzènes

Generalized Linear Bandits (GLBs) are powerful extensions to the Linear Bandit (LB) setting, broadening the benefits of reward parametrization beyond linearity.

Instance-Wise Minimax-Optimal Algorithms for Logistic Bandits

no code implementations23 Oct 2020 Marc Abeille, Louis Faury, Clément Calauzènes

It was shown by Faury et al. (2020) that the learning-theoretic difficulties of Logistic Bandits can be embodied by a large (sometimes prohibitively) problem-dependent constant $\kappa$, characterizing the magnitude of the reward's non-linearity.

Real-Time Optimisation for Online Learning in Auctions

no code implementations ICML 2020 Lorenzo Croissant, Marc Abeille, Clément Calauzènes

In display advertising, a small group of sellers and bidders face each other in up to 10 12 auctions a day.

Improved Optimistic Algorithms for Logistic Bandits

no code implementations ICML 2020 Louis Faury, Marc Abeille, Clément Calauzènes, Olivier Fercoq

For logistic bandits, the frequentist regret guarantees of existing algorithms are $\tilde{\mathcal{O}}(\kappa \sqrt{T})$, where $\kappa$ is a problem-dependent constant.

Thompson Sampling in Non-Episodic Restless Bandits

no code implementations12 Oct 2019 Young Hun Jung, Marc Abeille, Ambuj Tewari

Restless bandit problems assume time-varying reward distributions of the arms, which adds flexibility to the model but makes the analysis more challenging.

Open-Ended Question Answering Thompson Sampling

Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems

no code implementations ICML 2018 Marc Abeille, Alessandro Lazaric

Thompson sampling (TS) is an effective approach to trade off exploration and exploration in reinforcement learning.

Thompson Sampling

Thompson Sampling for Linear-Quadratic Control Problems

no code implementations27 Mar 2017 Marc Abeille, Alessandro Lazaric

Despite the empirical and theoretical success in a wide range of problems from multi-armed bandit to linear bandit, we show that when studying the frequentist regret TS in control problems, we need to trade-off the frequency of sampling optimistic parameters and the frequency of switches in the control policy.

Thompson Sampling

Linear Thompson Sampling Revisited

no code implementations20 Nov 2016 Marc Abeille, Alessandro Lazaric

We derive an alternative proof for the regret of Thompson sampling (\ts) in the stochastic linear bandit setting.

Thompson Sampling

Cannot find the paper you are looking for? You can Submit a new open access paper.