Multi-Armed Bandits

195 papers with code • 1 benchmarks • 2 datasets

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Benchmarks

Add a Result

These leaderboards are used to track progress in Multi-Armed Bandits

Trend	Dataset	Best Model	Paper	Code	Compare
	Mushroom	Linear FullPosterior-MR			See all

Libraries

Use these libraries to find Multi-Armed Bandits models and implementations

facebookresearch/Horizon

2 papers

3,521

facebookresearch/ReAgent

2 papers

3,521

st-tech/zr-obp

2 papers

614

Datasets

Most implemented papers

Most implemented Social Latest No code

Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling

sb-ai-lab/RePlay • • 29 Oct 2018

The DRR framework treats recommendation as a sequential decision making procedure and adopts an "Actor-Critic" reinforcement learning scheme to model the interactions between the users and recommender systems, which can consider both the dynamic adaptation and long-term rewards.

Paper
Code

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

tensorflow/models • • ICLR 2018

At the same time, advances in approximate Bayesian methods have made posterior approximation for flexible neural network models practical.

Paper
Code

Neural Contextual Bandits with UCB-based Exploration

sauxpa/neural_exploration • • ICML 2020

To the best of our knowledge, it is the first neural network-based contextual bandit algorithm with a near-optimal regret guarantee.

Paper
Code

On-line Adaptative Curriculum Learning for GANs

Byte7/Adaptive-Curriculum-GAN-keras • 31 Jul 2018

We argue that less expressive discriminators are smoother and have a general coarse grained view of the modes map, which enforces the generator to cover a wide portion of the data distribution support.

Paper
Code

Locally Differentially Private (Contextual) Bandits Learning

huang-research-group/LDPbandit2020 • • NeurIPS 2020

We study locally differentially private (LDP) bandits learning in this paper.

Paper
Code

Online Limited Memory Neural-Linear Bandits with Likelihood Matching

mlisicki/neuralkernelbandits • • 7 Feb 2021

To alleviate this, we propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.

Paper
Code

Off-Policy Evaluation for Large Action Spaces via Embeddings

st-tech/zr-obp • 13 Feb 2022

Unfortunately, when the number of actions is large, existing OPE estimators -- most of which are based on inverse propensity score weighting -- degrade severely and can suffer from extreme bias and variance.

Paper
Code

Multi-Armed Bandits in Metric Spaces

facebookresearch/Horizon • • 29 Sep 2008

In this work we study a very general setting for the multi-armed bandit problem in which the strategies form a metric space, and the payoff function satisfies a Lipschitz condition with respect to the metric.

Paper
Code

Optimal and Adaptive Off-policy Evaluation in Contextual Bandits

facebookresearch/ReAgent • • ICML 2017

We study the off-policy evaluation problem---estimating the value of a target policy using data collected by another policy---under the contextual bandit model.

Paper
Code

Semiparametric Contextual Bandits

akshaykr/oracle_cb • ICML 2018

This paper studies semiparametric contextual bandits, a generalization of the linear stochastic bandit problem where the reward for an action is modeled as a linear function of known action features confounded by an non-linear action-independent term.

Paper
Code

Multi-Armed Bandits

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result