Search Results for author: Dorian Baudry

Found 10 papers, 4 papers with code

The Value of Reward Lookahead in Reinforcement Learning

no code implementations18 Mar 2024 Nadav Merlis, Dorian Baudry, Vianney Perchet

In particular, we measure the ratio between the value of standard RL agents and that of agents with partial future-reward lookahead.

Offline RL reinforcement-learning +1

A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms

no code implementations10 Mar 2023 Dorian Baudry, Kazuya Suzuki, Junya Honda

In this paper we propose a general methodology to derive regret bounds for randomized multi-armed bandit algorithms.

Thompson Sampling

Towards an efficient and risk aware strategy for guiding farmers in identifying best crop management

no code implementations10 Oct 2022 Romain Gautron, Dorian Baudry, Myriam Adam, Gatien N Falconnier, Marc Corbeels

Identification of best performing fertilizer practices among a set of contrasting practices with field trials is challenging as crop losses are costly for farmers.

Management

Top Two Algorithms Revisited

no code implementations13 Jun 2022 Marc Jourdan, Rémy Degenne, Dorian Baudry, Rianne de Heide, Emilie Kaufmann

Top Two algorithms arose as an adaptation of Thompson sampling to best arm identification in multi-armed bandit models (Russo, 2016), for parametric families of arms.

Thompson Sampling Vocal Bursts Valence Prediction

Efficient Algorithms for Extreme Bandits

1 code implementation21 Mar 2022 Dorian Baudry, Yoan Russac, Emilie Kaufmann

In this paper, we contribute to the Extreme Bandit problem, a variant of Multi-Armed Bandits in which the learner seeks to collect the largest possible reward.

Multi-Armed Bandits

From Optimality to Robustness: Adaptive Re-Sampling Strategies in Stochastic Bandits

no code implementations NeurIPS 2021 Dorian Baudry, Patrick Saux, Odalric-Ambrym Maillard

The stochastic multi-arm bandit problem has been extensively studied under standard assumptions on the arm's distribution (e. g bounded with known support, exponential family, etc).

Decision Making

From Optimality to Robustness: Dirichlet Sampling Strategies in Stochastic Bandits

no code implementations18 Nov 2021 Dorian Baudry, Patrick Saux, Odalric-Ambrym Maillard

The stochastic multi-arm bandit problem has been extensively studied under standard assumptions on the arm's distribution (e. g bounded with known support, exponential family, etc).

Decision Making

On Limited-Memory Subsampling Strategies for Bandits

1 code implementation21 Jun 2021 Dorian Baudry, Yoan Russac, Olivier Cappé

There has been a recent surge of interest in nonparametric bandit algorithms based on subsampling.

Optimal Thompson Sampling strategies for support-aware CVaR bandits

1 code implementation10 Dec 2020 Dorian Baudry, Romain Gautron, Emilie Kaufmann, Odalric-Ambryn Maillard

In this paper we study a multi-arm bandit problem in which the quality of each arm is measured by the Conditional Value at Risk (CVaR) at some level alpha of the reward distribution.

Thompson Sampling

Sub-sampling for Efficient Non-Parametric Bandit Exploration

1 code implementation NeurIPS 2020 Dorian Baudry, Emilie Kaufmann, Odalric-Ambrym Maillard

In this paper we propose the first multi-armed bandit algorithm based on re-sampling that achieves asymptotically optimal regret simultaneously for different families of arms (namely Bernoulli, Gaussian and Poisson distributions).

Thompson Sampling

Cannot find the paper you are looking for? You can Submit a new open access paper.