no code implementations • 24 Mar 2021 • Wei Huang, Richard Combes, Cindy Trinh
We propose a novel algorithm for multi-player multi-armed bandits without collision sensing information.
1 code implementation • 19 Feb 2021 • Cindy Trinh, Richard Combes
Motivated by applications in cognitive radio networks, we consider the decentralized multi-player multi-armed bandit problem, without collision nor sensing information.
1 code implementation • 14 Feb 2021 • Thibaut Cuvelier, Richard Combes, Eric Gourdin
We consider combinatorial semi-bandits with uncorrelated Gaussian rewards.
1 code implementation • NeurIPS 2021 • Raymond Zhang, Richard Combes
In this paper we consider Thompson Sampling (TS) for combinatorial semi-bandits.
no code implementations • 16 Jul 2020 • Richard Combes, Mikael Touati
We further propose the SWCP (Self-Winning Cycles Propagation) algorithm and show that, when the degree is large enough, SWCP solves the game with high probability.
1 code implementation • 17 Feb 2020 • Thibaut Cuvelier, Richard Combes, Eric Gourdin
We consider combinatorial semi-bandits over a set of arms ${\cal X} \subset \{0, 1\}^d$ where rewards are uncorrelated across items.
no code implementations • 6 Dec 2019 • Cindy Trinh, Emilie Kaufmann, Claire Vernade, Richard Combes
Stochastic Rank-One Bandits (Katarya et al, (2017a, b)) are a simple framework for regret minimization problems over rank-one matrices of arms.
no code implementations • 15 Jun 2018 • Richard Combes, Mikael Touati
We consider the problem of estimating from sample paths the absolute spectral gap $\gamma_*$ of a reversible, irreducible and aperiodic Markov chain $(X_t)_{t \in \mathbb{N}}$ over a finite state space $\Omega$.
no code implementations • NeurIPS 2017 • Richard Combes, Stefan Magureanu, Alexandre Proutiere
This paper introduces and addresses a wide class of stochastic bandit problems where the function mapping the arm to the corresponding reward exhibits some known structural properties.
1 code implementation • 3 Mar 2017 • Jung-hun Kim, Se-Young Yun, Minchan Jeong, Jun Hyun Nam, Jinwoo Shin, Richard Combes
This implies that classical approaches cannot guarantee a non-trivial regret bound.
no code implementations • NeurIPS 2017 • Thomas Bonald, Richard Combes
We further propose Triangular Estimation (TE), an algorithm for estimating the reliability of workers.
no code implementations • 23 Feb 2016 • Thomas Bonald, Richard Combes
We propose a streaming algorithm for the binary classification of data based on crowdsourcing.
no code implementations • 17 Nov 2015 • Richard Combes
We generalize McDiarmid's inequality for functions with bounded differences on a high probability set, using an extension argument.
1 code implementation • NeurIPS 2015 • Richard Combes, M. Sadegh Talebi, Alexandre Proutiere, Marc Lelarge
In the adversarial setting under bandit feedback, we propose \textsc{CombEXP}, an algorithm with the same regret scaling as state-of-the-art algorithms, but with lower computational complexity for some combinatorial problems.
no code implementations • 28 Jun 2014 • Richard Combes, Alexandre Proutiere
To our knowledge, the SP algorithm constitutes the first sequential arm selection rule that achieves a regret and optimization error scaling as $O(\sqrt{T})$ and $O(1/\sqrt{T})$, respectively, up to a logarithmic factor for non-smooth expected reward functions, as well as for smooth functions with unknown smoothness.
no code implementations • 20 May 2014 • Richard Combes, Alexandre Proutiere
We also provide a regret upper bound for OSUB in non-stationary environments where the expected rewards smoothly evolve over time.
no code implementations • 19 May 2014 • Stefan Magureanu, Richard Combes, Alexandre Proutiere
For discrete Lipschitz bandits, we derive asymptotic problem specific lower bounds for the regret satisfied by any algorithm, and propose OSLB and CKL-UCB, two algorithms that efficiently exploit the Lipschitz structure of the problem.
no code implementations • 23 Feb 2014 • Richard Combes, Alexandre Proutiere
In turn, the proposed algorithms optimally exploit the inherent structure of the throughput.
no code implementations • 27 Sep 2013 • M. Sadegh Talebi, Zhenhua Zou, Richard Combes, Alexandre Proutiere, Mikael Johansson
The parameters, and hence the optimal path, can only be estimated by routing packets through the network and observing the realized delays.
no code implementations • 11 Jun 2013 • Richard Combes, Ilham El Bouloumi, Stephane Senecal, Zwi Altman
The purpose of this paper is to develop a self-optimized association algorithm based on PGRL (Policy Gradient Reinforcement Learning), which is both scalable, stable and robust.