Browse > Miscellaneous > Multi-Armed Bandits

Multi-Armed Bandits

11 papers with code · Miscellaneous

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

State-of-the-art leaderboards

Greatest papers with code

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

ICLR 2018 tensorflow/models

At the same time, advances in approximate Bayesian methods have made posterior approximation for flexible neural network models practical. To understand the impact of using an approximate posterior on Thompson Sampling, we benchmark well-established and recently developed methods for approximate posterior sampling combined with Thompson Sampling over a series of contextual bandit problems.


Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

4 Feb 2014VowpalWabbit/vowpal_wabbit

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action. Our method assumes access to an oracle for solving fully supervised cost-sensitive classification problems and achieves the statistically optimal regret guarantee with only $\tilde{O}(\sqrt{KT/\log N})$ oracle calls across all $T$ rounds, where $N$ is the number of policies in the policy class we compete against.


Adapting multi-armed bandits policies to contextual bandits scenarios

11 Nov 2018david-cortes/contextualbandits

This work explores adaptations of successful multi-armed bandits policies to the online contextual bandits scenario with binary rewards using binary classification algorithms such as logistic regression as black-box oracles. Some of these adaptations are achieved through bootstrapping or approximate bootstrapping, while others rely on other forms of randomness, resulting in more scalable approaches than previous works, and the ability to work with any type of classification algorithm.


Learning Structural Weight Uncertainty for Sequential Decision-Making

30 Dec 2017zhangry868/S2VGD

Learning probability distributions on the weights of neural networks (NNs) has recently proven beneficial in many applications. Bayesian methods, such as Stein variational gradient descent (SVGD), offer an elegant framework to reason about NN model uncertainty.


A Survey on Contextual Multi-armed Bandits

13 Aug 2015yanyangbaobeiIsEmma/Reinforcement-Learning-Contextual-Bandits

In this survey we cover a few stochastic and adversarial contextual bandit algorithms. We analyze each algorithm's assumption and regret bound.


Thompson Sampling for Contextual Bandits with Linear Payoffs

15 Sep 2012yanyangbaobeiIsEmma/Reinforcement-Learning-Contextual-Bandits

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical performance compared to the state-of-the-art methods.


The Assistive Multi-Armed Bandit

24 Jan 2019chanlaw/assistive-bandits

Learning preferences implicit in the choices humans make is a well studied problem in both economics and computer science. However, most work makes the assumption that humans are acting (noisily) optimally with respect to their preferences.


Heteroscedastic Bandits with Reneging

29 Oct 2018Xi-Liu/heteroscedasticbandits

Although shown to be useful in many areas as models for solving sequential decision problems with side observations (contexts), contextual bandits are subject to two major limitations. First, they neglect user "reneging" that occurs in real-world applications.


Incentives in the Dark: Multi-armed Bandits for Evolving Users with Unknown Type

11 Mar 2018fiezt/Incentive-Bandits

Design of incentives or recommendations to users is becoming more common as platform providers continually emerge. We propose a multi-armed bandit approach to the problem in which users types are unknown a priori and evolve dynamically in time.


Contextual Bandits with Stochastic Experts

23 Feb 2018rajatsen91/CB_StochasticExperts

We consider the problem of contextual bandits with stochastic experts, which is a variation of the traditional stochastic contextual bandit with experts problem. We propose upper-confidence bound (UCB) algorithms for this problem, which employ two different importance sampling based estimators for the mean reward for each expert.