Search Results for author: Richard Combes

Found 20 papers, 6 papers with code

Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information

no code implementations • 24 Mar 2021 • Wei Huang, Richard Combes, Cindy Trinh

We propose a novel algorithm for multi-player multi-armed bandits without collision sensing information.

Paper
Add Code

A High Performance, Low Complexity Algorithm for Multi-Player Bandits Without Collision Sensing Information

1 code implementation • 19 Feb 2021 • Cindy Trinh, Richard Combes

Motivated by applications in cognitive radio networks, we consider the decentralized multi-player multi-armed bandit problem, without collision nor sensing information.

Paper
Code

Asymptotically Optimal Strategies For Combinatorial Semi-Bandits in Polynomial Time

1 code implementation • 14 Feb 2021 • Thibaut Cuvelier, Richard Combes, Eric Gourdin

We consider combinatorial semi-bandits with uncorrelated Gaussian rewards.

Paper
Code

On the Suboptimality of Thompson Sampling in High Dimensions

1 code implementation • NeurIPS 2021 • Raymond Zhang, Richard Combes

In this paper we consider Thompson Sampling (TS) for combinatorial semi-bandits.

Thompson Sampling Vocal Bursts Intensity Prediction

Paper
Code

Solving Random Parity Games in Polynomial Time

no code implementations • 16 Jul 2020 • Richard Combes, Mikael Touati

We further propose the SWCP (Self-Winning Cycles Propagation) algorithm and show that, when the degree is large enough, SWCP solves the game with high probability.

Paper
Add Code

Statistically Efficient, Polynomial Time Algorithms for Combinatorial Semi Bandits

1 code implementation • 17 Feb 2020 • Thibaut Cuvelier, Richard Combes, Eric Gourdin

We consider combinatorial semi-bandits over a set of arms ${\cal X} \subset \{0, 1\}^d$ where rewards are uncorrelated across items.

Paper
Code

Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling

no code implementations • 6 Dec 2019 • Cindy Trinh, Emilie Kaufmann, Claire Vernade, Richard Combes

Stochastic Rank-One Bandits (Katarya et al, (2017a, b)) are a simple framework for regret minimization problems over rank-one matrices of arms.

Thompson Sampling

Paper
Add Code

Computationally Efficient Estimation of the Spectral Gap of a Markov Chain

no code implementations • 15 Jun 2018 • Richard Combes, Mikael Touati

We consider the problem of estimating from sample paths the absolute spectral gap $\gamma_*$ of a reversible, irreducible and aperiodic Markov chain $(X_t)_{t \in \mathbb{N}}$ over a finite state space $\Omega$.

Paper
Add Code

Minimal Exploration in Structured Stochastic Bandits

no code implementations • NeurIPS 2017 • Richard Combes, Stefan Magureanu, Alexandre Proutiere

This paper introduces and addresses a wide class of stochastic bandit problems where the function mapping the arm to the corresponding reward exhibits some known structural properties.

Thompson Sampling

Paper
Add Code

Contextual Linear Bandits under Noisy Features: Towards Bayesian Oracles

1 code implementation • 3 Mar 2017 • Jung-hun Kim, Se-Young Yun, Minchan Jeong, Jun Hyun Nam, Jinwoo Shin, Richard Combes

This implies that classical approaches cannot guarantee a non-trivial regret bound.

Multi-Armed Bandits

Paper
Code

A Minimax Optimal Algorithm for Crowdsourcing

no code implementations • NeurIPS 2017 • Thomas Bonald, Richard Combes

We further propose Triangular Estimation (TE), an algorithm for estimating the reliability of workers.

Paper
Add Code

A Streaming Algorithm for Crowdsourced Data Classification

no code implementations • 23 Feb 2016 • Thomas Bonald, Richard Combes

We propose a streaming algorithm for the binary classification of data based on crowdsourcing.

Binary Classification Classification +1

Paper
Add Code

An extension of McDiarmid's inequality

no code implementations • 17 Nov 2015 • Richard Combes

We generalize McDiarmid's inequality for functions with bounded differences on a high probability set, using an extension argument.

Paper
Add Code

Combinatorial Bandits Revisited

1 code implementation • NeurIPS 2015 • Richard Combes, M. Sadegh Talebi, Alexandre Proutiere, Marc Lelarge

In the adversarial setting under bandit feedback, we propose \textsc{CombEXP}, an algorithm with the same regret scaling as state-of-the-art algorithms, but with lower computational complexity for some combinatorial problems.

Paper
Code

Unimodal Bandits without Smoothness

no code implementations • 28 Jun 2014 • Richard Combes, Alexandre Proutiere

To our knowledge, the SP algorithm constitutes the first sequential arm selection rule that achieves a regret and optimization error scaling as $O(\sqrt{T})$ and $O(1/\sqrt{T})$, respectively, up to a logarithmic factor for non-smooth expected reward functions, as well as for smooth functions with unknown smoothness.

Paper
Add Code

Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms

no code implementations • 20 May 2014 • Richard Combes, Alexandre Proutiere

We also provide a regret upper bound for OSUB in non-stationary environments where the expected rewards smoothly evolve over time.

Multi-Armed Bandits

Paper
Add Code

Lipschitz Bandits: Regret Lower Bounds and Optimal Algorithms

no code implementations • 19 May 2014 • Stefan Magureanu, Richard Combes, Alexandre Proutiere

For discrete Lipschitz bandits, we derive asymptotic problem specific lower bounds for the regret satisfied by any algorithm, and propose OSLB and CKL-UCB, two algorithms that efficiently exploit the Lipschitz structure of the problem.

Multi-Armed Bandits

Paper
Add Code

Dynamic Rate and Channel Selection in Cognitive Radio Systems

no code implementations • 23 Feb 2014 • Richard Combes, Alexandre Proutiere

In turn, the proposed algorithms optimally exploit the inherent structure of the throughput.

Paper
Add Code

Stochastic Online Shortest Path Routing: The Value of Feedback

no code implementations • 27 Sep 2013 • M. Sadegh Talebi, Zhenhua Zou, Richard Combes, Alexandre Proutiere, Mikael Johansson

The parameters, and hence the optimal path, can only be estimated by routing packets through the network and observing the realized delays.

Paper
Add Code

The association problem in wireless networks: a Policy Gradient Reinforcement Learning approach

no code implementations • 11 Jun 2013 • Richard Combes, Ilham El Bouloumi, Stephane Senecal, Zwi Altman

The purpose of this paper is to develop a self-optimized association algorithm based on PGRL (Policy Gradient Reinforcement Learning), which is both scalable, stable and robust.

Q-Learning reinforcement-learning +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.