Multi-Armed Bandits

195 papers with code • 1 benchmarks • 2 datasets

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Libraries

Use these libraries to find Multi-Armed Bandits models and implementations

Best Arm Identification with Fixed Budget: A Large Deviation Perspective

rctzeng/neurips2023-cr NeurIPS 2023

In particular, we present \sred (Continuous Rejects), a truly adaptive algorithm that can reject arms in {\it any} round based on the observed empirical gaps between the rewards of various arms.

0
19 Dec 2023

Risk-Aware Continuous Control with Neural Contextual Bandits

jaayala/risk_aware_contextual_bandit 15 Dec 2023

Recent advances in learning techniques have garnered attention for their applicability to a diverse range of real-world sequential decision-making problems.

0
15 Dec 2023

Marginal Density Ratio for Off-Policy Evaluation in Contextual Bandits

faaizt/mr-ope NeurIPS 2023

Off-Policy Evaluation (OPE) in contextual bandits is crucial for assessing new policies using existing data without costly experimentation.

2
03 Dec 2023

Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits

aistats2024-noisy-psne/midsearch 25 Oct 2023

We design a near-optimal algorithm whose sample complexity matches the lower bound, up to log factors.

0
25 Oct 2023

Bayesian Design Principles for Frequentist Sequential Learning

xuyunbei/mab-code 1 Oct 2023

We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles.

0
01 Oct 2023

A Convex Framework for Confounding Robust Inference

kstoneriv3/confounding-robust-inference 21 Sep 2023

We study policy evaluation of offline contextual bandits subject to unobserved confounders.

0
21 Sep 2023

Master-slave Deep Architecture for Top-K Multi-armed Bandits with Non-linear Bandit Feedback and Diversity Constraints

huanghanchi/master-slave-algorithm-for-top-k-bandits 24 Aug 2023

We propose a novel master-slave architecture to solve the top-$K$ combinatorial multi-armed bandits problem with non-linear bandit feedback and diversity constraints, which, to the best of our knowledge, is the first combinatorial bandits setting considering diversity constraints under bandit feedback.

5
24 Aug 2023

Equitable Restless Multi-Armed Bandits: A General Framework Inspired By Digital Health

google-research/socialgood 17 Aug 2023

RMABs are increasingly being used for sensitive decisions such as in public health, treatment scheduling, anti-poaching, and -- the motivation for this work -- digital health.

11
17 Aug 2023

Adaptive Linear Estimating Equations

mufangying/alee NeurIPS 2023

Sequential data collection has emerged as a widely adopted technique for enhancing the efficiency of data gathering processes.

1
14 Jul 2023

Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits

jhwjhw0123/streaming-regret-minimization-mabs 3 Jun 2023

We first improve the regret lower bound to $\Omega(K^{1/3}T^{2/3})$ for algorithms with $o(K)$ memory, which matches the uniform exploration regret up to a logarithm factor in $T$.

0
03 Jun 2023