Search Results for author: Emilie Kaufmann

Found 44 papers, 8 papers with code

Optimal Multi-Fidelity Best-Arm Identification

no code implementations5 Jun 2024 Riccardo Poiani, Rémy Degenne, Emilie Kaufmann, Alberto Maria Metelli, Marcello Restelli

In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible.

Power Mean Estimation in Stochastic Monte-Carlo Tree_Search

no code implementations4 Jun 2024 Tuan Dam, Odalric-Ambrym Maillard, Emilie Kaufmann

Monte-Carlo Tree Search (MCTS) is a widely-used strategy for online planning that combines Monte-Carlo sampling with forward tree search.

Finding good policies in average-reward Markov Decision Processes without prior knowledge

no code implementations27 May 2024 Adrienne Tuynman, Rémy Degenne, Emilie Kaufmann

In this case, it is known that there exists an MDP with $D \simeq H$ for which the sample complexity to output an $\varepsilon$-optimal policy is $\Omega(SAD/\varepsilon^2)$ where $S$ and $A$ are the sizes of the state and action spaces.

Bandit Pareto Set Identification: the Fixed Budget Setting

no code implementations7 Nov 2023 Cyrille Kone, Emilie Kaufmann, Laura Richert

We study a multi-objective pure exploration problem in a multi-armed bandit model.

Towards Instance-Optimality in Online PAC Reinforcement Learning

no code implementations31 Oct 2023 Aymen Al-Marjani, Andrea Tirinzoni, Emilie Kaufmann

In this paper, we propose the first instance-dependent lower bound on the sample complexity required for the PAC identification of a near-optimal policy in any tabular episodic MDP.

reinforcement-learning

Active Coverage for PAC Reinforcement Learning

no code implementations23 Jun 2023 Aymen Al-Marjani, Andrea Tirinzoni, Emilie Kaufmann

In particular, we obtain a simple algorithm for PAC reward-free exploration with an instance-dependent sample complexity that, in certain MDPs which are "easy to explore", is lower than the minimax one.

reinforcement-learning Reinforcement Learning (RL)

Dealing with Unknown Variances in Best-Arm Identification

no code implementations3 Oct 2022 Marc Jourdan, Rémy Degenne, Emilie Kaufmann

The problem of identifying the best arm among a collection of items having Gaussian rewards distribution is well understood when the variances are known.

Optimistic PAC Reinforcement Learning: the Instance-Dependent View

no code implementations12 Jul 2022 Andrea Tirinzoni, Aymen Al-Marjani, Emilie Kaufmann

Optimistic algorithms have been extensively studied for regret minimization in episodic tabular MDPs, both from a minimax and an instance-dependent view.

reinforcement-learning Reinforcement Learning (RL)

Top Two Algorithms Revisited

no code implementations13 Jun 2022 Marc Jourdan, Rémy Degenne, Dorian Baudry, Rianne de Heide, Emilie Kaufmann

Top Two algorithms arose as an adaptation of Thompson sampling to best arm identification in multi-armed bandit models (Russo, 2016), for parametric families of arms.

Thompson Sampling Vocal Bursts Valence Prediction

Near-Optimal Collaborative Learning in Bandits

1 code implementation31 May 2022 Clémence Réda, Sattar Vakili, Emilie Kaufmann

In this paper, we provide new lower bounds on the sample complexity of pure exploration and on the regret.

Federated Learning

Efficient Algorithms for Extreme Bandits

1 code implementation21 Mar 2022 Dorian Baudry, Yoan Russac, Emilie Kaufmann

In this paper, we contribute to the Extreme Bandit problem, a variant of Multi-Armed Bandits in which the learner seeks to collect the largest possible reward.

Multi-Armed Bandits

Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs

no code implementations17 Mar 2022 Andrea Tirinzoni, Aymen Al-Marjani, Emilie Kaufmann

In probably approximately correct (PAC) reinforcement learning (RL), an agent is required to identify an $\epsilon$-optimal policy with probability $1-\delta$.

reinforcement-learning Reinforcement Learning (RL)

Top-m identification for linear bandits

1 code implementation18 Mar 2021 Clémence Réda, Emilie Kaufmann, Andrée Delahaye-Duriez

Motivated by an application to drug repurposing, we propose the first algorithms to tackle the identification of the m $\ge$ 1 arms with largest means in a linear bandit model, in the fixed-confidence setting.

Optimal Thompson Sampling strategies for support-aware CVaR bandits

1 code implementation10 Dec 2020 Dorian Baudry, Romain Gautron, Emilie Kaufmann, Odalric-Ambryn Maillard

In this paper we study a multi-arm bandit problem in which the quality of each arm is measured by the Conditional Value at Risk (CVaR) at some level alpha of the reward distribution.

Thompson Sampling

Sub-sampling for Efficient Non-Parametric Bandit Exploration

1 code implementation NeurIPS 2020 Dorian Baudry, Emilie Kaufmann, Odalric-Ambrym Maillard

In this paper we propose the first multi-armed bandit algorithm based on re-sampling that achieves asymptotically optimal regret simultaneously for different families of arms (namely Bernoulli, Gaussian and Poisson distributions).

Thompson Sampling

Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited

no code implementations7 Oct 2020 Omar Darwiche Domingues, Pierre Ménard, Emilie Kaufmann, Michal Valko

In this paper, we propose new problem-independent lower bounds on the sample complexity and regret in episodic MDPs, with a particular focus on the non-stationary case in which the transition kernel is allowed to change in each stage of the episode.

reinforcement-learning Reinforcement Learning (RL)

A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces

no code implementations9 Jul 2020 Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann, Michal Valko

In this work, we propose KeRNS: an algorithm for episodic reinforcement learning in non-stationary Markov Decision Processes (MDPs) whose state-action set is endowed with a metric.

reinforcement-learning Reinforcement Learning (RL)

Adaptive Reward-Free Exploration

no code implementations11 Jun 2020 Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Anders Jonsson, Edouard Leurent, Michal Valko

Reward-free exploration is a reinforcement learning setting studied by Jin et al. (2020), who address it by running several algorithms with regret guarantees in parallel.

Planning in Markov Decision Processes with Gap-Dependent Sample Complexity

no code implementations NeurIPS 2020 Anders Jonsson, Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Edouard Leurent, Michal Valko

We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support.

Kernel-Based Reinforcement Learning: A Finite-Time Analysis

1 code implementation12 Apr 2020 Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann, Michal Valko

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric.

reinforcement-learning Reinforcement Learning (RL)

Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling

no code implementations6 Dec 2019 Cindy Trinh, Emilie Kaufmann, Claire Vernade, Richard Combes

Stochastic Rank-One Bandits (Katarya et al, (2017a, b)) are a simple framework for regret minimization problems over rank-one matrices of arms.

Thompson Sampling

Fixed-Confidence Guarantees for Bayesian Best-Arm Identification

no code implementations24 Oct 2019 Xuedong Shang, Rianne de Heide, Emilie Kaufmann, Pierre Ménard, Michal Valko

We investigate and provide new insights on the sampling rule called Top-Two Thompson Sampling (TTTS).

Thompson Sampling

On Multi-Armed Bandit Designs for Dose-Finding Clinical Trials

no code implementations17 Mar 2019 Maryam Aziz, Emilie Kaufmann, Marie-Karelle Riviere

We study the problem of finding the optimal dosage in early stage clinical trials through the multi-armed bandit lens.

Thompson Sampling

Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits

no code implementations5 Feb 2019 Lilian Besson, Emilie Kaufmann, Odalric-Ambrym Maillard, Julien Seznec

We introduce GLR-klUCB, a novel algorithm for the piecewise iid non-stationary bandit problem with bounded rewards.

Change Point Detection

A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players

no code implementations4 Feb 2019 Etienne Boursier, Emilie Kaufmann, Abbas Mehrabian, Vianney Perchet

We study a multiplayer stochastic multi-armed bandit problem in which players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward.

Open-Ended Question Answering

Mixture Martingales Revisited with Applications to Sequential Tests and Confidence Intervals

no code implementations28 Nov 2018 Emilie Kaufmann, Wouter Koolen

This paper presents new deviation inequalities that are valid uniformly in time under adaptive sampling in a multi-armed bandit model.

valid

Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling

no code implementations NeurIPS 2018 Emilie Kaufmann, Wouter Koolen, Aurelien Garivier

We develop refined non-asymptotic lower bounds, which show that optimality mandates very different sampling behavior for a low vs high true minimum.

Reinforcement Learning (RL) Thompson Sampling

What Doubling Tricks Can and Can't Do for Multi-Armed Bandits

no code implementations19 Mar 2018 Lilian Besson, Emilie Kaufmann

In a broad setting, we prove that a geometric doubling trick can be used to conserve (minimax) bounds in $R\_T = O(\sqrt{T})$ but cannot conserve (distribution-dependent) bounds in $R\_T = O(\log T)$.

Multi-Armed Bandits

Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence

no code implementations13 Mar 2018 Maryam Aziz, Jesse Anderton, Emilie Kaufmann, Javed Aslam

We consider the problem of near-optimal arm identification in the fixed confidence setting of the infinitely armed bandit problem when nothing is known about the arm reservoir distribution.

Multi-Player Bandits Revisited

no code implementations7 Nov 2017 Lilian Besson, Emilie Kaufmann

Multi-player Multi-Armed Bandits (MAB) have been extensively studied in the literature, motivated by applications to Cognitive Radio systems.

Multi-Armed Bandits

Corrupt Bandits for Preserving Local Privacy

no code implementations16 Aug 2017 Pratik Gajane, Tanguy Urvoy, Emilie Kaufmann

In this framework, motivated by privacy preservation in online recommender systems, the goal is to maximize the sum of the (unobserved) rewards, based on the observation of transformation of these rewards through a stochastic corruption process with known parameters.

Recommendation Systems

Monte-Carlo Tree Search by Best Arm Identification

no code implementations NeurIPS 2017 Emilie Kaufmann, Wouter Koolen

Recent advances in bandit tools and techniques for sequential learning are steadily enabling new applications and are promising the resolution of a range of challenging related problems.

Learning the distribution with largest mean: two bandit frameworks

1 code implementation31 Jan 2017 Emilie Kaufmann, Aurélien Garivier

Over the past few years, the multi-armed bandit model has become increasingly popular in the machine learning community, partly because of applications including online content optimization.

BIG-bench Machine Learning Vocal Bursts Valence Prediction

Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits

no code implementations30 Jun 2016 Alexander Luedtke, Emilie Kaufmann, Antoine Chambaz

We study a generalization of the multi-armed bandit problem with multiple plays where there is a cost associated with pulling each arm and the agent has a budget at each time that dictates how much she can expect to spend.

Thompson Sampling

On Explore-Then-Commit Strategies

no code implementations NeurIPS 2016 Aurélien Garivier, Emilie Kaufmann, Tor Lattimore

We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards.

Optimal Best Arm Identification with Fixed Confidence

no code implementations15 Feb 2016 Aurélien Garivier, Emilie Kaufmann

We give a complete characterization of the complexity of best-arm identification in one-parameter bandit problems.

Maximin Action Identification: A New Bandit Framework for Games

no code implementations15 Feb 2016 Aurélien Garivier, Emilie Kaufmann, Wouter Koolen

We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search.

On Bayesian index policies for sequential resource allocation

no code implementations6 Jan 2016 Emilie Kaufmann

This paper is about index policies for minimizing (frequentist) regret in a stochastic multi-armed bandit model, inspired by a Bayesian view on the problem.

A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks

no code implementations12 Jun 2015 Emilie Kaufmann, Thomas Bonald, Marc Lelarge

This paper presents a novel spectral algorithm with additive clustering designed to identify overlapping communities in networks.

Clustering

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models

no code implementations16 Jul 2014 Emilie Kaufmann, Olivier Cappé, Aurélien Garivier

The stochastic multi-armed bandit model is a simple abstraction that has proven useful in many different contexts in statistics and machine learning.

LEMMA

On the Complexity of A/B Testing

no code implementations13 May 2014 Emilie Kaufmann, Olivier Cappé, Aurélien Garivier

A/B testing refers to the task of determining the best option among two alternatives that yield random outcomes.

Thompson Sampling for 1-Dimensional Exponential Family Bandits

no code implementations NeurIPS 2013 Nathaniel Korda, Emilie Kaufmann, Remi Munos

Thompson Sampling has been demonstrated in many complex bandit models, however the theoretical guarantees available for the parametric multi-armed bandit are still limited to the Bernoulli case.

Thompson Sampling

Thompson Sampling: An Asymptotically Optimal Finite Time Analysis

1 code implementation18 May 2012 Emilie Kaufmann, Nathaniel Korda, Rémi Munos

The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933.

3D Reconstruction Thompson Sampling

Cannot find the paper you are looking for? You can Submit a new open access paper.