no code implementations • 18 Mar 2024 • Nadav Merlis, Dorian Baudry, Vianney Perchet
In particular, we measure the ratio between the value of standard RL agents and that of agents with partial future-reward lookahead.
no code implementations • 10 Mar 2023 • Dorian Baudry, Kazuya Suzuki, Junya Honda
In this paper we propose a general methodology to derive regret bounds for randomized multi-armed bandit algorithms.
no code implementations • 10 Oct 2022 • Romain Gautron, Dorian Baudry, Myriam Adam, Gatien N Falconnier, Marc Corbeels
Identification of best performing fertilizer practices among a set of contrasting practices with field trials is challenging as crop losses are costly for farmers.
no code implementations • 13 Jun 2022 • Marc Jourdan, Rémy Degenne, Dorian Baudry, Rianne de Heide, Emilie Kaufmann
Top Two algorithms arose as an adaptation of Thompson sampling to best arm identification in multi-armed bandit models (Russo, 2016), for parametric families of arms.
1 code implementation • 21 Mar 2022 • Dorian Baudry, Yoan Russac, Emilie Kaufmann
In this paper, we contribute to the Extreme Bandit problem, a variant of Multi-Armed Bandits in which the learner seeks to collect the largest possible reward.
no code implementations • NeurIPS 2021 • Dorian Baudry, Patrick Saux, Odalric-Ambrym Maillard
The stochastic multi-arm bandit problem has been extensively studied under standard assumptions on the arm's distribution (e. g bounded with known support, exponential family, etc).
no code implementations • 18 Nov 2021 • Dorian Baudry, Patrick Saux, Odalric-Ambrym Maillard
The stochastic multi-arm bandit problem has been extensively studied under standard assumptions on the arm's distribution (e. g bounded with known support, exponential family, etc).
1 code implementation • 21 Jun 2021 • Dorian Baudry, Yoan Russac, Olivier Cappé
There has been a recent surge of interest in nonparametric bandit algorithms based on subsampling.
1 code implementation • 10 Dec 2020 • Dorian Baudry, Romain Gautron, Emilie Kaufmann, Odalric-Ambryn Maillard
In this paper we study a multi-arm bandit problem in which the quality of each arm is measured by the Conditional Value at Risk (CVaR) at some level alpha of the reward distribution.
1 code implementation • NeurIPS 2020 • Dorian Baudry, Emilie Kaufmann, Odalric-Ambrym Maillard
In this paper we propose the first multi-armed bandit algorithm based on re-sampling that achieves asymptotically optimal regret simultaneously for different families of arms (namely Bernoulli, Gaussian and Poisson distributions).