Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

At the same time, advances in approximate Bayesian methods have made posterior approximation for flexible neural network models practical.

69,480

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

4 Feb 2014VowpalWabbit/vowpal_wabbit

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action.

7,522

Gaussian Gated Linear Networks

We propose the Gaussian Gated Linear Network (G-GLN), an extension to the recently proposed GLN family of deep neural networks.

6,459

Locally Differentially Private (Contextual) Bandits Learning

We study locally differentially private (LDP) bandits learning in this paper.

2,160

A Survey on Contextual Multi-armed Bandits

13 Aug 2015bgalbraith/bandits

In this survey we cover a few stochastic and adversarial contextual bandit algorithms.

550

Adapting multi-armed bandits policies to contextual bandits scenarios

11 Nov 2018david-cortes/contextualbandits

This work explores adaptations of successful multi-armed bandits policies to the online contextual bandits scenario with binary rewards using binary classification algorithms such as logistic regression as black-box oracles.

397

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

2 Apr 2020ymy4323460/HATCH

In this paper, we propose a hierarchical adaptive contextual bandit method (HATCH) to conduct the policy learning of contextual bandits with a budget constraint.

23

Carousel Personalization in Music Streaming Apps with Contextual Bandits

14 Sep 2020deezer/carousel_bandits

Media services providers, such as music streaming platforms, frequently leverage swipeable carousels to recommend personalized content to their users.

22

Neural Contextual Bandits with UCB-based Exploration

To the best of our knowledge, it is the first neural network-based contextual bandit algorithm with a near-optimal regret guarantee.

22

Model selection for contextual bandits

We work in the stochastic realizable setting with a sequence of nested linear policy classes of dimension $d_1 < d_2 < \ldots$, where the $m^\star$-th class contains the optimal policy, and we design an algorithm that achieves $\tilde{O}(T^{2/3}d^{1/3}_{m^\star})$ regret with no prior knowledge of the optimal dimension $d_{m^\star}$.

21