Multi-Armed Bandits

133 papers with code • 1 benchmarks • 1 datasets

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Libraries

Use these libraries to find Multi-Armed Bandits models and implementations
2 papers
451

Most implemented papers

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

tensorflow/models ICLR 2018

At the same time, advances in approximate Bayesian methods have made posterior approximation for flexible neural network models practical.

On-line Adaptative Curriculum Learning for GANs

Byte7/Adaptive-Curriculum-GAN-keras 31 Jul 2018

We argue that less expressive discriminators are smoother and have a general coarse grained view of the modes map, which enforces the generator to cover a wide portion of the data distribution support.

Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling

backgom2357/Recommender_system_via_deep_RL 29 Oct 2018

The DRR framework treats recommendation as a sequential decision making procedure and adopts an "Actor-Critic" reinforcement learning scheme to model the interactions between the users and recommender systems, which can consider both the dynamic adaptation and long-term rewards.

Locally Differentially Private (Contextual) Bandits Learning

huang-research-group/LDPbandit2020 NeurIPS 2020

We study locally differentially private (LDP) bandits learning in this paper.

Multi-Armed Bandits in Metric Spaces

facebookresearch/Horizon 29 Sep 2008

In this work we study a very general setting for the multi-armed bandit problem in which the strategies form a metric space, and the payoff function satisfies a Lipschitz condition with respect to the metric.

Optimal and Adaptive Off-policy Evaluation in Contextual Bandits

facebookresearch/ReAgent ICML 2017

We study the off-policy evaluation problem---estimating the value of a target policy using data collected by another policy---under the contextual bandit model.

Semiparametric Contextual Bandits

akshaykr/oracle_cb ICML 2018

This paper studies semiparametric contextual bandits, a generalization of the linear stochastic bandit problem where the reward for an action is modeled as a linear function of known action features confounded by an non-linear action-independent term.

Correlated Multi-armed Bandits with a Latent Random Source

shreyasc-13/correlated_bandits 17 Aug 2018

As a result, there are regimes where our algorithm achieves a $\mathcal{O}(1)$ regret as opposed to the typical logarithmic regret scaling of multi-armed bandit algorithms.

Adapting multi-armed bandits policies to contextual bandits scenarios

david-cortes/contextualbandits 11 Nov 2018

This work explores adaptations of successful multi-armed bandits policies to the online contextual bandits scenario with binary rewards using binary classification algorithms such as logistic regression as black-box oracles.

Bayesian Optimisation over Multiple Continuous and Categorical Inputs

rubinxin/CoCaBO_code ICML 2020

Efficient optimisation of black-box problems that comprise both continuous and categorical inputs is important, yet poses significant challenges.