Browse > Miscellaneous > Multi-Armed Bandits

# Multi-Armed Bandits Edit

37 papers with code · Miscellaneous

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

# Leaderboards Add a Result

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

# Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

At the same time, advances in approximate Bayesian methods have made posterior approximation for flexible neural network models practical.

62,835

# Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

4 Feb 2014VowpalWabbit/vowpal_wabbit

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action.

6,997

# Adapting multi-armed bandits policies to contextual bandits scenarios

11 Nov 2018david-cortes/contextualbandits

This work explores adaptations of successful multi-armed bandits policies to the online contextual bandits scenario with binary rewards using binary classification algorithms such as logistic regression as black-box oracles.

230

# Model Selection for Contextual Bandits

We work in the stochastic realizable setting with a sequence of nested linear policy classes of dimension $d_1 < d_2 < \ldots$, where the $m^\star$-th class contains the optimal policy, and we design an algorithm that achieves $\tilde{O}l(T^{2/3}d^{1/3}_{m^\star})$ regret with no prior knowledge of the optimal dimension $d_{m^\star}$.

16

# Model selection for contextual bandits

We work in the stochastic realizable setting with a sequence of nested linear policy classes of dimension $d_1 < d_2 < \ldots$, where the $m^\star$-th class contains the optimal policy, and we design an algorithm that achieves $\tilde{O}(T^{2/3}d^{1/3}_{m^\star})$ regret with no prior knowledge of the optimal dimension $d_{m^\star}$.

16

# Semiparametric Contextual Bandits

This paper studies semiparametric contextual bandits, a generalization of the linear stochastic bandit problem where the reward for an action is modeled as a linear function of known action features confounded by an non-linear action-independent term.

16

# Practical Calculation of Gittins Indices for Multi-armed Bandits

11 Sep 2019jedwards24/gittins

Gittins indices provide an optimal solution to the classical multi-armed bandit problem.

6

# Learning Structural Weight Uncertainty for Sequential Decision-Making

30 Dec 2017zhangry868/S2VGD

Learning probability distributions on the weights of neural networks (NNs) has recently proven beneficial in many applications.

6

# Thompson Sampling for Contextual Bandits with Linear Payoffs

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems.

5

# Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

We propose new estimators for OPE based on empirical likelihood that are always more efficient than IS, SNIS, and DR and satisfy the same stability and boundedness properties as SNIS.

4