Browse > Miscellaneous > Multi-Armed Bandits

# Multi-Armed Bandits Edit

11 papers with code · Miscellaneous

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

Trend Dataset Best Method Paper title Paper Code Compare

# Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

At the same time, advances in approximate Bayesian methods have made posterior approximation for flexible neural network models practical.

51,663

# Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

4 Feb 2014VowpalWabbit/vowpal_wabbit

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action.

6,273

# Adapting multi-armed bandits policies to contextual bandits scenarios

11 Nov 2018david-cortes/contextualbandits

This work explores adaptations of successful multi-armed bandits policies to the online contextual bandits scenario with binary rewards using binary classification algorithms such as logistic regression as black-box oracles.

96

# Learning Structural Weight Uncertainty for Sequential Decision-Making

30 Dec 2017zhangry868/S2VGD

Learning probability distributions on the weights of neural networks (NNs) has recently proven beneficial in many applications.

5

# A Survey on Contextual Multi-armed Bandits

In this survey we cover a few stochastic and adversarial contextual bandit algorithms.

3

# Thompson Sampling for Contextual Bandits with Linear Payoffs

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems.

3

# The Assistive Multi-Armed Bandit

24 Jan 2019chanlaw/assistive-bandits

Learning preferences implicit in the choices humans make is a well studied problem in both economics and computer science.

2

# On-line Adaptative Curriculum Learning for GANs

We argue that less expressive discriminators are smoother and have a general coarse grained view of the modes map, which enforces the generator to cover a wide portion of the data distribution support.

2

# Heteroscedastic Bandits with Reneging

29 Oct 2018Xi-Liu/heteroscedasticbandits

Although shown to be useful in many areas as models for solving sequential decision problems with side observations (contexts), contextual bandits are subject to two major limitations.

1

# Contextual Bandits with Stochastic Experts

23 Feb 2018rajatsen91/CB_StochasticExperts

We consider the problem of contextual bandits with stochastic experts, which is a variation of the traditional stochastic contextual bandit with experts problem.

1