Search Results for author: Emilie Kaufmann

Found 41 papers, 8 papers with code

Bandit Pareto Set Identification: the Fixed Budget Setting

no code implementations • 7 Nov 2023 • Cyrille Kone, Emilie Kaufmann, Laura Richert

We study a multi-objective pure exploration problem in a multi-armed bandit model.

Paper
Add Code

Towards Instance-Optimality in Online PAC Reinforcement Learning

no code implementations • 31 Oct 2023 • Aymen Al-Marjani, Andrea Tirinzoni, Emilie Kaufmann

In this paper, we propose the first instance-dependent lower bound on the sample complexity required for the PAC identification of a near-optimal policy in any tabular episodic MDP.

reinforcement-learning

Paper
Add Code

Active Coverage for PAC Reinforcement Learning

no code implementations • 23 Jun 2023 • Aymen Al-Marjani, Andrea Tirinzoni, Emilie Kaufmann

In particular, we obtain a simple algorithm for PAC reward-free exploration with an instance-dependent sample complexity that, in certain MDPs which are "easy to explore", is lower than the minimax one.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Dealing with Unknown Variances in Best-Arm Identification

no code implementations • 3 Oct 2022 • Marc Jourdan, Rémy Degenne, Emilie Kaufmann

The problem of identifying the best arm among a collection of items having Gaussian rewards distribution is well understood when the variances are known.

Paper
Add Code

Optimistic PAC Reinforcement Learning: the Instance-Dependent View

no code implementations • 12 Jul 2022 • Andrea Tirinzoni, Aymen Al-Marjani, Emilie Kaufmann

Optimistic algorithms have been extensively studied for regret minimization in episodic tabular MDPs, both from a minimax and an instance-dependent view.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Top Two Algorithms Revisited

no code implementations • 13 Jun 2022 • Marc Jourdan, Rémy Degenne, Dorian Baudry, Rianne de Heide, Emilie Kaufmann

Top Two algorithms arose as an adaptation of Thompson sampling to best arm identification in multi-armed bandit models (Russo, 2016), for parametric families of arms.

Thompson Sampling Vocal Bursts Valence Prediction

Paper
Add Code

Near-Optimal Collaborative Learning in Bandits

1 code implementation • 31 May 2022 • Clémence Réda, Sattar Vakili, Emilie Kaufmann

In this paper, we provide new lower bounds on the sample complexity of pure exploration and on the regret.

Federated Learning

Paper
Code

Efficient Algorithms for Extreme Bandits

1 code implementation • 21 Mar 2022 • Dorian Baudry, Yoan Russac, Emilie Kaufmann

In this paper, we contribute to the Extreme Bandit problem, a variant of Multi-Armed Bandits in which the learner seeks to collect the largest possible reward.

Multi-Armed Bandits

Paper
Code

Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs

no code implementations • 17 Mar 2022 • Andrea Tirinzoni, Aymen Al-Marjani, Emilie Kaufmann

In probably approximately correct (PAC) reinforcement learning (RL), an agent is required to identify an $\epsilon$-optimal policy with probability $1-\delta$.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Top-m identification for linear bandits

1 code implementation • 18 Mar 2021 • Clémence Réda, Emilie Kaufmann, Andrée Delahaye-Duriez

Motivated by an application to drug repurposing, we propose the first algorithms to tackle the identification of the m $\ge$ 1 arms with largest means in a linear bandit model, in the fixed-confidence setting.

Paper
Code

Optimal Thompson Sampling strategies for support-aware CVaR bandits

1 code implementation • 10 Dec 2020 • Dorian Baudry, Romain Gautron, Emilie Kaufmann, Odalric-Ambryn Maillard

In this paper we study a multi-arm bandit problem in which the quality of each arm is measured by the Conditional Value at Risk (CVaR) at some level alpha of the reward distribution.

Thompson Sampling

Paper
Code

Sub-sampling for Efficient Non-Parametric Bandit Exploration

1 code implementation • NeurIPS 2020 • Dorian Baudry, Emilie Kaufmann, Odalric-Ambrym Maillard

In this paper we propose the first multi-armed bandit algorithm based on re-sampling that achieves asymptotically optimal regret simultaneously for different families of arms (namely Bernoulli, Gaussian and Poisson distributions).

Thompson Sampling

Paper
Code

Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited

no code implementations • 7 Oct 2020 • Omar Darwiche Domingues, Pierre Ménard, Emilie Kaufmann, Michal Valko

In this paper, we propose new problem-independent lower bounds on the sample complexity and regret in episodic MDPs, with a particular focus on the non-stationary case in which the transition kernel is allowed to change in each stage of the episode.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Fast active learning for pure exploration in reinforcement learning

no code implementations • 27 Jul 2020 • Pierre Ménard, Omar Darwiche Domingues, Anders Jonsson, Emilie Kaufmann, Edouard Leurent, Michal Valko

Realistic environments often provide agents with very limited feedback.

Active Learning reinforcement-learning +1

Paper
Add Code

A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces

no code implementations • 9 Jul 2020 • Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann, Michal Valko

In this work, we propose KeRNS: an algorithm for episodic reinforcement learning in non-stationary Markov Decision Processes (MDPs) whose state-action set is endowed with a metric.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Adaptive Reward-Free Exploration

no code implementations • 11 Jun 2020 • Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Anders Jonsson, Edouard Leurent, Michal Valko

Reward-free exploration is a reinforcement learning setting studied by Jin et al. (2020), who address it by running several algorithms with regret guarantees in parallel.

Paper
Add Code

Planning in Markov Decision Processes with Gap-Dependent Sample Complexity

no code implementations • NeurIPS 2020 • Anders Jonsson, Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Edouard Leurent, Michal Valko

We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support.

Paper
Add Code

Kernel-Based Reinforcement Learning: A Finite-Time Analysis

1 code implementation • 12 Apr 2020 • Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann, Michal Valko

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling

no code implementations • 6 Dec 2019 • Cindy Trinh, Emilie Kaufmann, Claire Vernade, Richard Combes

Stochastic Rank-One Bandits (Katarya et al, (2017a, b)) are a simple framework for regret minimization problems over rank-one matrices of arms.

Thompson Sampling

Paper
Add Code

Fixed-Confidence Guarantees for Bayesian Best-Arm Identification

no code implementations • 24 Oct 2019 • Xuedong Shang, Rianne de Heide, Emilie Kaufmann, Pierre Ménard, Michal Valko

We investigate and provide new insights on the sampling rule called Top-Two Thompson Sampling (TTTS).

Thompson Sampling

Paper
Add Code

On Multi-Armed Bandit Designs for Dose-Finding Clinical Trials

no code implementations • 17 Mar 2019 • Maryam Aziz, Emilie Kaufmann, Marie-Karelle Riviere

We study the problem of finding the optimal dosage in early stage clinical trials through the multi-armed bandit lens.

Thompson Sampling

Paper
Add Code

Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits

no code implementations • 5 Feb 2019 • Lilian Besson, Emilie Kaufmann, Odalric-Ambrym Maillard, Julien Seznec

We introduce GLR-klUCB, a novel algorithm for the piecewise iid non-stationary bandit problem with bounded rewards.

Change Point Detection

Paper
Add Code

A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players

no code implementations • 4 Feb 2019 • Etienne Boursier, Emilie Kaufmann, Abbas Mehrabian, Vianney Perchet

We study a multiplayer stochastic multi-armed bandit problem in which players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward.

Open-Ended Question Answering

Paper
Add Code

Mixture Martingales Revisited with Applications to Sequential Tests and Confidence Intervals

no code implementations • 28 Nov 2018 • Emilie Kaufmann, Wouter Koolen

This paper presents new deviation inequalities that are valid uniformly in time under adaptive sampling in a multi-armed bandit model.

valid

Paper
Add Code

Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling

no code implementations • NeurIPS 2018 • Emilie Kaufmann, Wouter Koolen, Aurelien Garivier

We develop refined non-asymptotic lower bounds, which show that optimality mandates very different sampling behavior for a low vs high true minimum.

Reinforcement Learning (RL) Thompson Sampling

Paper
Add Code

What Doubling Tricks Can and Can't Do for Multi-Armed Bandits

no code implementations • 19 Mar 2018 • Lilian Besson, Emilie Kaufmann

In a broad setting, we prove that a geometric doubling trick can be used to conserve (minimax) bounds in $R\_T = O(\sqrt{T})$ but cannot conserve (distribution-dependent) bounds in $R\_T = O(\log T)$.

Multi-Armed Bandits

Paper
Add Code

Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence

no code implementations • 13 Mar 2018 • Maryam Aziz, Jesse Anderton, Emilie Kaufmann, Javed Aslam

We consider the problem of near-optimal arm identification in the fixed confidence setting of the infinitely armed bandit problem when nothing is known about the arm reservoir distribution.

Paper
Add Code

Multi-Player Bandits Revisited

no code implementations • 7 Nov 2017 • Lilian Besson, Emilie Kaufmann

Multi-player Multi-Armed Bandits (MAB) have been extensively studied in the literature, motivated by applications to Cognitive Radio systems.

Multi-Armed Bandits

Paper
Add Code

Corrupt Bandits for Preserving Local Privacy

no code implementations • 16 Aug 2017 • Pratik Gajane, Tanguy Urvoy, Emilie Kaufmann

In this framework, motivated by privacy preservation in online recommender systems, the goal is to maximize the sum of the (unobserved) rewards, based on the observation of transformation of these rewards through a stochastic corruption process with known parameters.

Recommendation Systems

Paper
Add Code

Monte-Carlo Tree Search by Best Arm Identification

no code implementations • NeurIPS 2017 • Emilie Kaufmann, Wouter Koolen

Recent advances in bandit tools and techniques for sequential learning are steadily enabling new applications and are promising the resolution of a range of challenging related problems.

Paper
Add Code

Learning the distribution with largest mean: two bandit frameworks

1 code implementation • 31 Jan 2017 • Emilie Kaufmann, Aurélien Garivier

Over the past few years, the multi-armed bandit model has become increasingly popular in the machine learning community, partly because of applications including online content optimization.

BIG-bench Machine Learning Vocal Bursts Valence Prediction

Paper
Code

Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits

no code implementations • 30 Jun 2016 • Alexander Luedtke, Emilie Kaufmann, Antoine Chambaz

We study a generalization of the multi-armed bandit problem with multiple plays where there is a cost associated with pulling each arm and the agent has a budget at each time that dictates how much she can expect to spend.

Thompson Sampling

Paper
Add Code

On Explore-Then-Commit Strategies

no code implementations • NeurIPS 2016 • Aurélien Garivier, Emilie Kaufmann, Tor Lattimore

We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards.

Paper
Add Code

Maximin Action Identification: A New Bandit Framework for Games

no code implementations • 15 Feb 2016 • Aurélien Garivier, Emilie Kaufmann, Wouter Koolen

We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search.

Paper
Add Code

Optimal Best Arm Identification with Fixed Confidence

no code implementations • 15 Feb 2016 • Aurélien Garivier, Emilie Kaufmann

We give a complete characterization of the complexity of best-arm identification in one-parameter bandit problems.

Paper
Add Code

On Bayesian index policies for sequential resource allocation

no code implementations • 6 Jan 2016 • Emilie Kaufmann

This paper is about index policies for minimizing (frequentist) regret in a stochastic multi-armed bandit model, inspired by a Bayesian view on the problem.

Paper
Add Code

A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks

no code implementations • 12 Jun 2015 • Emilie Kaufmann, Thomas Bonald, Marc Lelarge

This paper presents a novel spectral algorithm with additive clustering designed to identify overlapping communities in networks.

Clustering

Paper
Add Code

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models

no code implementations • 16 Jul 2014 • Emilie Kaufmann, Olivier Cappé, Aurélien Garivier

The stochastic multi-armed bandit model is a simple abstraction that has proven useful in many different contexts in statistics and machine learning.

LEMMA

Paper
Add Code

On the Complexity of A/B Testing

no code implementations • 13 May 2014 • Emilie Kaufmann, Olivier Cappé, Aurélien Garivier

A/B testing refers to the task of determining the best option among two alternatives that yield random outcomes.

Paper
Add Code

Thompson Sampling for 1-Dimensional Exponential Family Bandits

no code implementations • NeurIPS 2013 • Nathaniel Korda, Emilie Kaufmann, Remi Munos

Thompson Sampling has been demonstrated in many complex bandit models, however the theoretical guarantees available for the parametric multi-armed bandit are still limited to the Bernoulli case.

Thompson Sampling

Paper
Add Code

Thompson Sampling: An Asymptotically Optimal Finite Time Analysis

1 code implementation • 18 May 2012 • Emilie Kaufmann, Nathaniel Korda, Rémi Munos

The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933.

3D Reconstruction Thompson Sampling

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.