Search Results for author: Richard Combes

Found 20 papers, 6 papers with code

Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information

no code implementations24 Mar 2021 Wei Huang, Richard Combes, Cindy Trinh

We propose a novel algorithm for multi-player multi-armed bandits without collision sensing information.

Multi-Armed Bandits

A High Performance, Low Complexity Algorithm for Multi-Player Bandits Without Collision Sensing Information

1 code implementation19 Feb 2021 Cindy Trinh, Richard Combes

Motivated by applications in cognitive radio networks, we consider the decentralized multi-player multi-armed bandit problem, without collision nor sensing information.

Solving Random Parity Games in Polynomial Time

no code implementations16 Jul 2020 Richard Combes, Mikael Touati

We further propose the SWCP (Self-Winning Cycles Propagation) algorithm and show that, when the degree is large enough, SWCP solves the game with high probability.

Statistically Efficient, Polynomial Time Algorithms for Combinatorial Semi Bandits

1 code implementation17 Feb 2020 Thibaut Cuvelier, Richard Combes, Eric Gourdin

We consider combinatorial semi-bandits over a set of arms ${\cal X} \subset \{0, 1\}^d$ where rewards are uncorrelated across items.

Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling

no code implementations6 Dec 2019 Cindy Trinh, Emilie Kaufmann, Claire Vernade, Richard Combes

Stochastic Rank-One Bandits (Katarya et al, (2017a, b)) are a simple framework for regret minimization problems over rank-one matrices of arms.

Thompson Sampling

Computationally Efficient Estimation of the Spectral Gap of a Markov Chain

no code implementations15 Jun 2018 Richard Combes, Mikael Touati

We consider the problem of estimating from sample paths the absolute spectral gap $\gamma_*$ of a reversible, irreducible and aperiodic Markov chain $(X_t)_{t \in \mathbb{N}}$ over a finite state space $\Omega$.

Minimal Exploration in Structured Stochastic Bandits

no code implementations NeurIPS 2017 Richard Combes, Stefan Magureanu, Alexandre Proutiere

This paper introduces and addresses a wide class of stochastic bandit problems where the function mapping the arm to the corresponding reward exhibits some known structural properties.

Thompson Sampling

A Minimax Optimal Algorithm for Crowdsourcing

no code implementations NeurIPS 2017 Thomas Bonald, Richard Combes

We further propose Triangular Estimation (TE), an algorithm for estimating the reliability of workers.

A Streaming Algorithm for Crowdsourced Data Classification

no code implementations23 Feb 2016 Thomas Bonald, Richard Combes

We propose a streaming algorithm for the binary classification of data based on crowdsourcing.

Binary Classification Classification +1

An extension of McDiarmid's inequality

no code implementations17 Nov 2015 Richard Combes

We generalize McDiarmid's inequality for functions with bounded differences on a high probability set, using an extension argument.

Combinatorial Bandits Revisited

1 code implementation NeurIPS 2015 Richard Combes, M. Sadegh Talebi, Alexandre Proutiere, Marc Lelarge

In the adversarial setting under bandit feedback, we propose \textsc{CombEXP}, an algorithm with the same regret scaling as state-of-the-art algorithms, but with lower computational complexity for some combinatorial problems.

Unimodal Bandits without Smoothness

no code implementations28 Jun 2014 Richard Combes, Alexandre Proutiere

To our knowledge, the SP algorithm constitutes the first sequential arm selection rule that achieves a regret and optimization error scaling as $O(\sqrt{T})$ and $O(1/\sqrt{T})$, respectively, up to a logarithmic factor for non-smooth expected reward functions, as well as for smooth functions with unknown smoothness.

Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms

no code implementations20 May 2014 Richard Combes, Alexandre Proutiere

We also provide a regret upper bound for OSUB in non-stationary environments where the expected rewards smoothly evolve over time.

Multi-Armed Bandits

Lipschitz Bandits: Regret Lower Bounds and Optimal Algorithms

no code implementations19 May 2014 Stefan Magureanu, Richard Combes, Alexandre Proutiere

For discrete Lipschitz bandits, we derive asymptotic problem specific lower bounds for the regret satisfied by any algorithm, and propose OSLB and CKL-UCB, two algorithms that efficiently exploit the Lipschitz structure of the problem.

Multi-Armed Bandits

Dynamic Rate and Channel Selection in Cognitive Radio Systems

no code implementations23 Feb 2014 Richard Combes, Alexandre Proutiere

In turn, the proposed algorithms optimally exploit the inherent structure of the throughput.

Stochastic Online Shortest Path Routing: The Value of Feedback

no code implementations27 Sep 2013 M. Sadegh Talebi, Zhenhua Zou, Richard Combes, Alexandre Proutiere, Mikael Johansson

The parameters, and hence the optimal path, can only be estimated by routing packets through the network and observing the realized delays.

The association problem in wireless networks: a Policy Gradient Reinforcement Learning approach

no code implementations11 Jun 2013 Richard Combes, Ilham El Bouloumi, Stephane Senecal, Zwi Altman

The purpose of this paper is to develop a self-optimized association algorithm based on PGRL (Policy Gradient Reinforcement Learning), which is both scalable, stable and robust.

Q-Learning reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.