Multi-Armed Bandits
227 papers with code • 1 benchmarks • 2 datasets
Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.
( Image credit: Microsoft Research )
Libraries
Use these libraries to find Multi-Armed Bandits models and implementationsMost implemented papers
Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling
The DRR framework treats recommendation as a sequential decision making procedure and adopts an "Actor-Critic" reinforcement learning scheme to model the interactions between the users and recommender systems, which can consider both the dynamic adaptation and long-term rewards.
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling
At the same time, advances in approximate Bayesian methods have made posterior approximation for flexible neural network models practical.
Neural Contextual Bandits with UCB-based Exploration
To the best of our knowledge, it is the first neural network-based contextual bandit algorithm with a near-optimal regret guarantee.
On-line Adaptative Curriculum Learning for GANs
We argue that less expressive discriminators are smoother and have a general coarse grained view of the modes map, which enforces the generator to cover a wide portion of the data distribution support.
Locally Differentially Private (Contextual) Bandits Learning
We study locally differentially private (LDP) bandits learning in this paper.
Gaussian Gated Linear Networks
We propose the Gaussian Gated Linear Network (G-GLN), an extension to the recently proposed GLN family of deep neural networks.
Online Limited Memory Neural-Linear Bandits with Likelihood Matching
To alleviate this, we propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.
Off-Policy Evaluation for Large Action Spaces via Embeddings
Unfortunately, when the number of actions is large, existing OPE estimators -- most of which are based on inverse propensity score weighting -- degrade severely and can suffer from extreme bias and variance.
Multi-Armed Bandits in Metric Spaces
In this work we study a very general setting for the multi-armed bandit problem in which the strategies form a metric space, and the payoff function satisfies a Lipschitz condition with respect to the metric.
Thompson Sampling for Contextual Bandits with Linear Payoffs
Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems.