Browse > Miscellaneous > Multi-Armed Bandits

# Multi-Armed Bandits Edit

15 papers with code · Miscellaneous

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

Trend Dataset Best Method Paper title Paper Code Compare

# Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

At the same time, advances in approximate Bayesian methods have made posterior approximation for flexible neural network models practical.

58,394

# Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

4 Feb 2014VowpalWabbit/vowpal_wabbit

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action.

6,649

# Adapting multi-armed bandits policies to contextual bandits scenarios

11 Nov 2018david-cortes/contextualbandits

This work explores adaptations of successful multi-armed bandits policies to the online contextual bandits scenario with binary rewards using binary classification algorithms such as logistic regression as black-box oracles.

160

# Learning Structural Weight Uncertainty for Sequential Decision-Making

30 Dec 2017zhangry868/S2VGD

Learning probability distributions on the weights of neural networks (NNs) has recently proven beneficial in many applications.

5

# A Survey on Contextual Multi-armed Bandits

In this survey we cover a few stochastic and adversarial contextual bandit algorithms.

5

# Thompson Sampling for Contextual Bandits with Linear Payoffs

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems.

5

# Practical Calculation of Gittins Indices for Multi-armed Bandits

11 Sep 2019jedwards24/gittins

Gittins indices provide an optimal solution to the classical multi-armed bandit problem.

4

# The Assistive Multi-Armed Bandit

24 Jan 2019chanlaw/assistive-bandits

Learning preferences implicit in the choices humans make is a well studied problem in both economics and computer science.

2