Multi-Armed Bandits
195 papers with code • 1 benchmarks • 2 datasets
Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.
( Image credit: Microsoft Research )
Libraries
Use these libraries to find Multi-Armed Bandits models and implementationsMost implemented papers
Correlated Multi-armed Bandits with a Latent Random Source
As a result, there are regimes where our algorithm achieves a $\mathcal{O}(1)$ regret as opposed to the typical logarithmic regret scaling of multi-armed bandit algorithms.
Adapting multi-armed bandits policies to contextual bandits scenarios
This work explores adaptations of successful multi-armed bandits policies to the online contextual bandits scenario with binary rewards using binary classification algorithms such as logistic regression as black-box oracles.
Bayesian Optimisation over Multiple Continuous and Categorical Inputs
Efficient optimisation of black-box problems that comprise both continuous and categorical inputs is important, yet poses significant challenges.
Multi-Armed Bandits with Correlated Arms
We consider a multi-armed bandit framework where the rewards obtained by pulling different arms are correlated.
The Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms
This finding diverges from the notion of free exploration, which relates to covariate variation, as recently discussed in contextual bandit literature.
Gaussian Gated Linear Networks
We propose the Gaussian Gated Linear Network (G-GLN), an extension to the recently proposed GLN family of deep neural networks.
BanditPAM: Almost Linear Time $k$-Medoids Clustering via Multi-Armed Bandits
Current state-of-the-art $k$-medoids clustering algorithms, such as Partitioning Around Medoids (PAM), are iterative and are quadratic in the dataset size $n$ for each iteration, being prohibitively expensive for large datasets.
Dual-Mandate Patrols: Multi-Armed Bandits for Green Security
Conservation efforts in green security domains to protect wildlife and forests are constrained by the limited availability of defenders (i. e., patrollers), who must patrol vast areas to protect from attackers (e. g., poachers or illegal loggers).
Neural Thompson Sampling
Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-armed bandit problems.
Quantile Bandits for Best Arms Identification
We consider a variant of the best arm identification task in stochastic multi-armed bandits.