Multi-Armed Bandits

87 papers with code • 1 benchmarks • 0 datasets

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )


Greatest papers with code

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

tensorflow/models ICLR 2018

At the same time, advances in approximate Bayesian methods have made posterior approximation for flexible neural network models practical.

Decision Making Multi-Armed Bandits

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

VowpalWabbit/vowpal_wabbit 4 Feb 2014

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action.

General Classification Multi-Armed Bandits

Gaussian Gated Linear Networks

deepmind/deepmind-research NeurIPS 2020

We propose the Gaussian Gated Linear Network (G-GLN), an extension to the recently proposed GLN family of deep neural networks.

Denoising Density Estimation +1

A Survey on Contextual Multi-armed Bandits

bgalbraith/bandits 13 Aug 2015

In this survey we cover a few stochastic and adversarial contextual bandit algorithms.

Multi-Armed Bandits

Adapting multi-armed bandits policies to contextual bandits scenarios

david-cortes/contextualbandits 11 Nov 2018

This work explores adaptations of successful multi-armed bandits policies to the online contextual bandits scenario with binary rewards using binary classification algorithms such as logistic regression as black-box oracles.

General Classification Multi-Armed Bandits

Carousel Personalization in Music Streaming Apps with Contextual Bandits

deezer/carousel_bandits 14 Sep 2020

Media services providers, such as music streaming platforms, frequently leverage swipeable carousels to recommend personalized content to their users.

Multi-Armed Bandits

Neural Contextual Bandits with UCB-based Exploration

sauxpa/neural_exploration ICML 2020

To the best of our knowledge, it is the first neural network-based contextual bandit algorithm with a near-optimal regret guarantee.

Efficient Exploration Multi-Armed Bandits

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

ymy4323460/HATCH 2 Apr 2020

In this paper, we propose a hierarchical adaptive contextual bandit method (HATCH) to conduct the policy learning of contextual bandits with a budget constraint.

Multi-Armed Bandits

Model selection for contextual bandits

akshaykr/oracle_cb NeurIPS 2019

We work in the stochastic realizable setting with a sequence of nested linear policy classes of dimension $d_1 < d_2 < \ldots$, where the $m^\star$-th class contains the optimal policy, and we design an algorithm that achieves $\tilde{O}(T^{2/3}d^{1/3}_{m^\star})$ regret with no prior knowledge of the optimal dimension $d_{m^\star}$.

Model Selection Multi-Armed Bandits