# Multi-Armed Bandits

133 papers with code • 1 benchmarks • 1 datasets

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

## Libraries

Use these libraries to find Multi-Armed Bandits models and implementations
2 papers
451

# Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

At the same time, advances in approximate Bayesian methods have made posterior approximation for flexible neural network models practical.

3

# On-line Adaptative Curriculum Learning for GANs

31 Jul 2018

We argue that less expressive discriminators are smoother and have a general coarse grained view of the modes map, which enforces the generator to cover a wide portion of the data distribution support.

3

# Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling

29 Oct 2018

The DRR framework treats recommendation as a sequential decision making procedure and adopts an "Actor-Critic" reinforcement learning scheme to model the interactions between the users and recommender systems, which can consider both the dynamic adaptation and long-term rewards.

3

# Locally Differentially Private (Contextual) Bandits Learning

We study locally differentially private (LDP) bandits learning in this paper.

3

# Multi-Armed Bandits in Metric Spaces

29 Sep 2008

In this work we study a very general setting for the multi-armed bandit problem in which the strategies form a metric space, and the payoff function satisfies a Lipschitz condition with respect to the metric.

2

# Optimal and Adaptive Off-policy Evaluation in Contextual Bandits

We study the off-policy evaluation problem---estimating the value of a target policy using data collected by another policy---under the contextual bandit model.

2

# Semiparametric Contextual Bandits

This paper studies semiparametric contextual bandits, a generalization of the linear stochastic bandit problem where the reward for an action is modeled as a linear function of known action features confounded by an non-linear action-independent term.

2

# Correlated Multi-armed Bandits with a Latent Random Source

17 Aug 2018

As a result, there are regimes where our algorithm achieves a $\mathcal{O}(1)$ regret as opposed to the typical logarithmic regret scaling of multi-armed bandit algorithms.

2

# Adapting multi-armed bandits policies to contextual bandits scenarios

11 Nov 2018

This work explores adaptations of successful multi-armed bandits policies to the online contextual bandits scenario with binary rewards using binary classification algorithms such as logistic regression as black-box oracles.

2

# Bayesian Optimisation over Multiple Continuous and Categorical Inputs

Efficient optimisation of black-box problems that comprise both continuous and categorical inputs is important, yet poses significant challenges.

2