Multi-Armed Bandits

196 papers with code • 1 benchmarks • 2 datasets

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Libraries

Use these libraries to find Multi-Armed Bandits models and implementations

Latest papers with no code

Contextual Restless Multi-Armed Bandits with Application to Demand Response Decision-Making

no code yet • 22 Mar 2024

This paper introduces a novel multi-armed bandits framework, termed Contextual Restless Bandits (CRB), for complex online decision-making.

Transfer in Sequential Multi-armed Bandits via Reward Samples

no code yet • 19 Mar 2024

We consider a sequential stochastic multi-armed bandit problem where the agent interacts with bandit over multiple episodes.

Phasic Diversity Optimization for Population-Based Reinforcement Learning

no code yet • 17 Mar 2024

Furthermore, we construct a dogfight scenario for aerial agents to demonstrate the practicality of the PDO algorithm.

ε-Neural Thompson Sampling of Deep Brain Stimulation for Parkinson Disease Treatment

no code yet • 11 Mar 2024

Traditional commercial DBS devices are only able to deliver fixed-frequency periodic pulses to the basal ganglia (BG) regions of the brain, i. e., continuous DBS (cDBS).

Efficient Public Health Intervention Planning Using Decomposition-Based Decision-Focused Learning

no code yet • 8 Mar 2024

However, the availability and time of these health workers are limited resources.

LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits

no code yet • 5 Mar 2024

For this issue, this study proposes an algorithm whose regret satisfies $O(\log(T))$ in the setting when the suboptimality gap is lower-bounded.

Adaptive Learning Rate for Follow-the-Regularized-Leader: Competitive Analysis and Best-of-Both-Worlds

no code yet • 1 Mar 2024

Follow-The-Regularized-Leader (FTRL) is known as an effective and versatile approach in online learning, where appropriate choice of the learning rate is crucial for smaller regret.

Investigating Gender Fairness in Machine Learning-driven Personalized Care for Chronic Pain

no code yet • 29 Feb 2024

In this article, we study gender fairness in personalized pain care recommendations using a real-world application of reinforcement learning (Piette et al., 2022a).

Federated Linear Contextual Bandits with Heterogeneous Clients

no code yet • 29 Feb 2024

The demand for collaborative and private bandit learning across multiple agents is surging due to the growing quantity of data generated from distributed systems.

Batched Nonparametric Contextual Bandits

no code yet • 27 Feb 2024

We study nonparametric contextual bandits under batch constraints, where the expected reward for each action is modeled as a smooth function of covariates, and the policy updates are made at the end of each batch of observations.