Multi-Armed Bandits
196 papers with code • 1 benchmarks • 2 datasets
Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.
( Image credit: Microsoft Research )
Libraries
Use these libraries to find Multi-Armed Bandits models and implementationsLatest papers with no code
Contextual Restless Multi-Armed Bandits with Application to Demand Response Decision-Making
This paper introduces a novel multi-armed bandits framework, termed Contextual Restless Bandits (CRB), for complex online decision-making.
Transfer in Sequential Multi-armed Bandits via Reward Samples
We consider a sequential stochastic multi-armed bandit problem where the agent interacts with bandit over multiple episodes.
Phasic Diversity Optimization for Population-Based Reinforcement Learning
Furthermore, we construct a dogfight scenario for aerial agents to demonstrate the practicality of the PDO algorithm.
ε-Neural Thompson Sampling of Deep Brain Stimulation for Parkinson Disease Treatment
Traditional commercial DBS devices are only able to deliver fixed-frequency periodic pulses to the basal ganglia (BG) regions of the brain, i. e., continuous DBS (cDBS).
Efficient Public Health Intervention Planning Using Decomposition-Based Decision-Focused Learning
However, the availability and time of these health workers are limited resources.
LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits
For this issue, this study proposes an algorithm whose regret satisfies $O(\log(T))$ in the setting when the suboptimality gap is lower-bounded.
Adaptive Learning Rate for Follow-the-Regularized-Leader: Competitive Analysis and Best-of-Both-Worlds
Follow-The-Regularized-Leader (FTRL) is known as an effective and versatile approach in online learning, where appropriate choice of the learning rate is crucial for smaller regret.
Investigating Gender Fairness in Machine Learning-driven Personalized Care for Chronic Pain
In this article, we study gender fairness in personalized pain care recommendations using a real-world application of reinforcement learning (Piette et al., 2022a).
Federated Linear Contextual Bandits with Heterogeneous Clients
The demand for collaborative and private bandit learning across multiple agents is surging due to the growing quantity of data generated from distributed systems.
Batched Nonparametric Contextual Bandits
We study nonparametric contextual bandits under batch constraints, where the expected reward for each action is modeled as a smooth function of covariates, and the policy updates are made at the end of each batch of observations.