Multi-Armed Bandits
195 papers with code • 1 benchmarks • 2 datasets
Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.
( Image credit: Microsoft Research )
Libraries
Use these libraries to find Multi-Armed Bandits models and implementationsLatest papers
Best Arm Identification with Fixed Budget: A Large Deviation Perspective
In particular, we present \sred (Continuous Rejects), a truly adaptive algorithm that can reject arms in {\it any} round based on the observed empirical gaps between the rewards of various arms.
Risk-Aware Continuous Control with Neural Contextual Bandits
Recent advances in learning techniques have garnered attention for their applicability to a diverse range of real-world sequential decision-making problems.
Marginal Density Ratio for Off-Policy Evaluation in Contextual Bandits
Off-Policy Evaluation (OPE) in contextual bandits is crucial for assessing new policies using existing data without costly experimentation.
Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits
We design a near-optimal algorithm whose sample complexity matches the lower bound, up to log factors.
Bayesian Design Principles for Frequentist Sequential Learning
We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles.
A Convex Framework for Confounding Robust Inference
We study policy evaluation of offline contextual bandits subject to unobserved confounders.
Master-slave Deep Architecture for Top-K Multi-armed Bandits with Non-linear Bandit Feedback and Diversity Constraints
We propose a novel master-slave architecture to solve the top-$K$ combinatorial multi-armed bandits problem with non-linear bandit feedback and diversity constraints, which, to the best of our knowledge, is the first combinatorial bandits setting considering diversity constraints under bandit feedback.
Equitable Restless Multi-Armed Bandits: A General Framework Inspired By Digital Health
RMABs are increasingly being used for sensitive decisions such as in public health, treatment scheduling, anti-poaching, and -- the motivation for this work -- digital health.
Adaptive Linear Estimating Equations
Sequential data collection has emerged as a widely adopted technique for enhancing the efficiency of data gathering processes.
Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits
We first improve the regret lower bound to $\Omega(K^{1/3}T^{2/3})$ for algorithms with $o(K)$ memory, which matches the uniform exploration regret up to a logarithm factor in $T$.