About

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Benchmarks

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Greatest papers with code

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

ICLR 2018 tensorflow/models

At the same time, advances in approximate Bayesian methods have made posterior approximation for flexible neural network models practical.

DECISION MAKING MULTI-ARMED BANDITS

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

4 Feb 2014VowpalWabbit/vowpal_wabbit

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action.

MULTI-ARMED BANDITS

Gaussian Gated Linear Networks

NeurIPS 2020 deepmind/deepmind-research

We propose the Gaussian Gated Linear Network (G-GLN), an extension to the recently proposed GLN family of deep neural networks.

DENOISING DENSITY ESTIMATION MULTI-ARMED BANDITS

A Survey on Contextual Multi-armed Bandits

13 Aug 2015bgalbraith/bandits

In this survey we cover a few stochastic and adversarial contextual bandit algorithms.

MULTI-ARMED BANDITS

Adapting multi-armed bandits policies to contextual bandits scenarios

11 Nov 2018david-cortes/contextualbandits

This work explores adaptations of successful multi-armed bandits policies to the online contextual bandits scenario with binary rewards using binary classification algorithms such as logistic regression as black-box oracles.

MULTI-ARMED BANDITS

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

2 Apr 2020ymy4323460/HATCH

In this paper, we propose a hierarchical adaptive contextual bandit method (HATCH) to conduct the policy learning of contextual bandits with a budget constraint.

MULTI-ARMED BANDITS

Carousel Personalization in Music Streaming Apps with Contextual Bandits

14 Sep 2020deezer/carousel_bandits

Media services providers, such as music streaming platforms, frequently leverage swipeable carousels to recommend personalized content to their users.

MULTI-ARMED BANDITS

Neural Contextual Bandits with UCB-based Exploration

ICML 2020 sauxpa/neural_exploration

To the best of our knowledge, it is the first neural network-based contextual bandit algorithm with a near-optimal regret guarantee.

EFFICIENT EXPLORATION MULTI-ARMED BANDITS

Model selection for contextual bandits

NeurIPS 2019 akshaykr/oracle_cb

We work in the stochastic realizable setting with a sequence of nested linear policy classes of dimension $d_1 < d_2 < \ldots$, where the $m^\star$-th class contains the optimal policy, and we design an algorithm that achieves $\tilde{O}(T^{2/3}d^{1/3}_{m^\star})$ regret with no prior knowledge of the optimal dimension $d_{m^\star}$.

4 MODEL SELECTION MULTI-ARMED BANDITS