Multi-Armed Bandits

195 papers with code • 1 benchmarks • 2 datasets

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Benchmarks

Add a Result

These leaderboards are used to track progress in Multi-Armed Bandits

Trend	Dataset	Best Model	Paper	Code	Compare
	Mushroom	Linear FullPosterior-MR			See all

Libraries

Use these libraries to find Multi-Armed Bandits models and implementations

facebookresearch/Horizon

2 papers

3,521

facebookresearch/ReAgent

2 papers

3,521

st-tech/zr-obp

2 papers

614

Datasets

Latest papers

Most implemented Social Latest No code

Best Arm Identification with Fixed Budget: A Large Deviation Perspective

rctzeng/neurips2023-cr • NeurIPS 2023

In particular, we present \sred (Continuous Rejects), a truly adaptive algorithm that can reject arms in {\it any} round based on the observed empirical gaps between the rewards of various arms.

19 Dec 2023

Paper
Code

Risk-Aware Continuous Control with Neural Contextual Bandits

jaayala/risk_aware_contextual_bandit • • 15 Dec 2023

Recent advances in learning techniques have garnered attention for their applicability to a diverse range of real-world sequential decision-making problems.

15 Dec 2023

Paper
Code

Marginal Density Ratio for Off-Policy Evaluation in Contextual Bandits

faaizt/mr-ope • • NeurIPS 2023

Off-Policy Evaluation (OPE) in contextual bandits is crucial for assessing new policies using existing data without costly experimentation.

03 Dec 2023

Paper
Code

Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits

aistats2024-noisy-psne/midsearch • • 25 Oct 2023

We design a near-optimal algorithm whose sample complexity matches the lower bound, up to log factors.

25 Oct 2023

Paper
Code

Bayesian Design Principles for Frequentist Sequential Learning

xuyunbei/mab-code • 1 Oct 2023

We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles.

01 Oct 2023

Paper
Code

A Convex Framework for Confounding Robust Inference

kstoneriv3/confounding-robust-inference • • 21 Sep 2023

We study policy evaluation of offline contextual bandits subject to unobserved confounders.

21 Sep 2023

Paper
Code

Master-slave Deep Architecture for Top-K Multi-armed Bandits with Non-linear Bandit Feedback and Diversity Constraints

huanghanchi/master-slave-algorithm-for-top-k-bandits • • 24 Aug 2023

We propose a novel master-slave architecture to solve the top-$K$ combinatorial multi-armed bandits problem with non-linear bandit feedback and diversity constraints, which, to the best of our knowledge, is the first combinatorial bandits setting considering diversity constraints under bandit feedback.

24 Aug 2023

Paper
Code

Equitable Restless Multi-Armed Bandits: A General Framework Inspired By Digital Health

google-research/socialgood • 17 Aug 2023

RMABs are increasingly being used for sensitive decisions such as in public health, treatment scheduling, anti-poaching, and -- the motivation for this work -- digital health.

17 Aug 2023

Paper
Code

Adaptive Linear Estimating Equations

mufangying/alee • NeurIPS 2023

Sequential data collection has emerged as a widely adopted technique for enhancing the efficiency of data gathering processes.

14 Jul 2023

Paper
Code

Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits

jhwjhw0123/streaming-regret-minimization-mabs • 3 Jun 2023

We first improve the regret lower bound to $\Omega(K^{1/3}T^{2/3})$ for algorithms with $o(K)$ memory, which matches the uniform exploration regret up to a logarithm factor in $T$.

03 Jun 2023

Paper
Code

Multi-Armed Bandits

Benchmarks Add a Result

Libraries

Datasets

Latest papers

Content

Benchmarks

Add a Result