no code implementations • 5 Jun 2024 • Riccardo Poiani, Rémy Degenne, Emilie Kaufmann, Alberto Maria Metelli, Marcello Restelli
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible.
no code implementations • 4 Jun 2024 • Tuan Dam, Odalric-Ambrym Maillard, Emilie Kaufmann
Monte-Carlo Tree Search (MCTS) is a widely-used strategy for online planning that combines Monte-Carlo sampling with forward tree search.
no code implementations • 27 May 2024 • Adrienne Tuynman, Rémy Degenne, Emilie Kaufmann
In this case, it is known that there exists an MDP with $D \simeq H$ for which the sample complexity to output an $\varepsilon$-optimal policy is $\Omega(SAD/\varepsilon^2)$ where $S$ and $A$ are the sizes of the state and action spaces.
no code implementations • 7 Nov 2023 • Cyrille Kone, Emilie Kaufmann, Laura Richert
We study a multi-objective pure exploration problem in a multi-armed bandit model.
no code implementations • 31 Oct 2023 • Aymen Al-Marjani, Andrea Tirinzoni, Emilie Kaufmann
In this paper, we propose the first instance-dependent lower bound on the sample complexity required for the PAC identification of a near-optimal policy in any tabular episodic MDP.
no code implementations • 23 Jun 2023 • Aymen Al-Marjani, Andrea Tirinzoni, Emilie Kaufmann
In particular, we obtain a simple algorithm for PAC reward-free exploration with an instance-dependent sample complexity that, in certain MDPs which are "easy to explore", is lower than the minimax one.
no code implementations • 3 Oct 2022 • Marc Jourdan, Rémy Degenne, Emilie Kaufmann
The problem of identifying the best arm among a collection of items having Gaussian rewards distribution is well understood when the variances are known.
no code implementations • 12 Jul 2022 • Andrea Tirinzoni, Aymen Al-Marjani, Emilie Kaufmann
Optimistic algorithms have been extensively studied for regret minimization in episodic tabular MDPs, both from a minimax and an instance-dependent view.
no code implementations • 13 Jun 2022 • Marc Jourdan, Rémy Degenne, Dorian Baudry, Rianne de Heide, Emilie Kaufmann
Top Two algorithms arose as an adaptation of Thompson sampling to best arm identification in multi-armed bandit models (Russo, 2016), for parametric families of arms.
1 code implementation • 31 May 2022 • Clémence Réda, Sattar Vakili, Emilie Kaufmann
In this paper, we provide new lower bounds on the sample complexity of pure exploration and on the regret.
1 code implementation • 21 Mar 2022 • Dorian Baudry, Yoan Russac, Emilie Kaufmann
In this paper, we contribute to the Extreme Bandit problem, a variant of Multi-Armed Bandits in which the learner seeks to collect the largest possible reward.
no code implementations • 17 Mar 2022 • Andrea Tirinzoni, Aymen Al-Marjani, Emilie Kaufmann
In probably approximately correct (PAC) reinforcement learning (RL), an agent is required to identify an $\epsilon$-optimal policy with probability $1-\delta$.
1 code implementation • 18 Mar 2021 • Clémence Réda, Emilie Kaufmann, Andrée Delahaye-Duriez
Motivated by an application to drug repurposing, we propose the first algorithms to tackle the identification of the m $\ge$ 1 arms with largest means in a linear bandit model, in the fixed-confidence setting.
1 code implementation • 10 Dec 2020 • Dorian Baudry, Romain Gautron, Emilie Kaufmann, Odalric-Ambryn Maillard
In this paper we study a multi-arm bandit problem in which the quality of each arm is measured by the Conditional Value at Risk (CVaR) at some level alpha of the reward distribution.
1 code implementation • NeurIPS 2020 • Dorian Baudry, Emilie Kaufmann, Odalric-Ambrym Maillard
In this paper we propose the first multi-armed bandit algorithm based on re-sampling that achieves asymptotically optimal regret simultaneously for different families of arms (namely Bernoulli, Gaussian and Poisson distributions).
no code implementations • 7 Oct 2020 • Omar Darwiche Domingues, Pierre Ménard, Emilie Kaufmann, Michal Valko
In this paper, we propose new problem-independent lower bounds on the sample complexity and regret in episodic MDPs, with a particular focus on the non-stationary case in which the transition kernel is allowed to change in each stage of the episode.
no code implementations • 27 Jul 2020 • Pierre Ménard, Omar Darwiche Domingues, Anders Jonsson, Emilie Kaufmann, Edouard Leurent, Michal Valko
Realistic environments often provide agents with very limited feedback.
no code implementations • 9 Jul 2020 • Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann, Michal Valko
In this work, we propose KeRNS: an algorithm for episodic reinforcement learning in non-stationary Markov Decision Processes (MDPs) whose state-action set is endowed with a metric.
no code implementations • 11 Jun 2020 • Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Anders Jonsson, Edouard Leurent, Michal Valko
Reward-free exploration is a reinforcement learning setting studied by Jin et al. (2020), who address it by running several algorithms with regret guarantees in parallel.
no code implementations • NeurIPS 2020 • Anders Jonsson, Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Edouard Leurent, Michal Valko
We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support.
1 code implementation • 12 Apr 2020 • Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann, Michal Valko
We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric.
no code implementations • 6 Dec 2019 • Cindy Trinh, Emilie Kaufmann, Claire Vernade, Richard Combes
Stochastic Rank-One Bandits (Katarya et al, (2017a, b)) are a simple framework for regret minimization problems over rank-one matrices of arms.
no code implementations • 24 Oct 2019 • Xuedong Shang, Rianne de Heide, Emilie Kaufmann, Pierre Ménard, Michal Valko
We investigate and provide new insights on the sampling rule called Top-Two Thompson Sampling (TTTS).
no code implementations • 17 Mar 2019 • Maryam Aziz, Emilie Kaufmann, Marie-Karelle Riviere
We study the problem of finding the optimal dosage in early stage clinical trials through the multi-armed bandit lens.
no code implementations • 5 Feb 2019 • Lilian Besson, Emilie Kaufmann, Odalric-Ambrym Maillard, Julien Seznec
We introduce GLR-klUCB, a novel algorithm for the piecewise iid non-stationary bandit problem with bounded rewards.
no code implementations • 4 Feb 2019 • Etienne Boursier, Emilie Kaufmann, Abbas Mehrabian, Vianney Perchet
We study a multiplayer stochastic multi-armed bandit problem in which players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward.
no code implementations • 28 Nov 2018 • Emilie Kaufmann, Wouter Koolen
This paper presents new deviation inequalities that are valid uniformly in time under adaptive sampling in a multi-armed bandit model.
no code implementations • NeurIPS 2018 • Emilie Kaufmann, Wouter Koolen, Aurelien Garivier
We develop refined non-asymptotic lower bounds, which show that optimality mandates very different sampling behavior for a low vs high true minimum.
no code implementations • 19 Mar 2018 • Lilian Besson, Emilie Kaufmann
In a broad setting, we prove that a geometric doubling trick can be used to conserve (minimax) bounds in $R\_T = O(\sqrt{T})$ but cannot conserve (distribution-dependent) bounds in $R\_T = O(\log T)$.
no code implementations • 13 Mar 2018 • Maryam Aziz, Jesse Anderton, Emilie Kaufmann, Javed Aslam
We consider the problem of near-optimal arm identification in the fixed confidence setting of the infinitely armed bandit problem when nothing is known about the arm reservoir distribution.
no code implementations • 7 Nov 2017 • Lilian Besson, Emilie Kaufmann
Multi-player Multi-Armed Bandits (MAB) have been extensively studied in the literature, motivated by applications to Cognitive Radio systems.
no code implementations • 16 Aug 2017 • Pratik Gajane, Tanguy Urvoy, Emilie Kaufmann
In this framework, motivated by privacy preservation in online recommender systems, the goal is to maximize the sum of the (unobserved) rewards, based on the observation of transformation of these rewards through a stochastic corruption process with known parameters.
no code implementations • NeurIPS 2017 • Emilie Kaufmann, Wouter Koolen
Recent advances in bandit tools and techniques for sequential learning are steadily enabling new applications and are promising the resolution of a range of challenging related problems.
1 code implementation • 31 Jan 2017 • Emilie Kaufmann, Aurélien Garivier
Over the past few years, the multi-armed bandit model has become increasingly popular in the machine learning community, partly because of applications including online content optimization.
no code implementations • 30 Jun 2016 • Alexander Luedtke, Emilie Kaufmann, Antoine Chambaz
We study a generalization of the multi-armed bandit problem with multiple plays where there is a cost associated with pulling each arm and the agent has a budget at each time that dictates how much she can expect to spend.
no code implementations • NeurIPS 2016 • Aurélien Garivier, Emilie Kaufmann, Tor Lattimore
We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards.
no code implementations • 15 Feb 2016 • Aurélien Garivier, Emilie Kaufmann
We give a complete characterization of the complexity of best-arm identification in one-parameter bandit problems.
no code implementations • 15 Feb 2016 • Aurélien Garivier, Emilie Kaufmann, Wouter Koolen
We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search.
no code implementations • 6 Jan 2016 • Emilie Kaufmann
This paper is about index policies for minimizing (frequentist) regret in a stochastic multi-armed bandit model, inspired by a Bayesian view on the problem.
no code implementations • 12 Jun 2015 • Emilie Kaufmann, Thomas Bonald, Marc Lelarge
This paper presents a novel spectral algorithm with additive clustering designed to identify overlapping communities in networks.
no code implementations • 16 Jul 2014 • Emilie Kaufmann, Olivier Cappé, Aurélien Garivier
The stochastic multi-armed bandit model is a simple abstraction that has proven useful in many different contexts in statistics and machine learning.
no code implementations • 13 May 2014 • Emilie Kaufmann, Olivier Cappé, Aurélien Garivier
A/B testing refers to the task of determining the best option among two alternatives that yield random outcomes.
no code implementations • NeurIPS 2013 • Nathaniel Korda, Emilie Kaufmann, Remi Munos
Thompson Sampling has been demonstrated in many complex bandit models, however the theoretical guarantees available for the parametric multi-armed bandit are still limited to the Bernoulli case.
1 code implementation • 18 May 2012 • Emilie Kaufmann, Nathaniel Korda, Rémi Munos
The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933.