Thompson Sampling
99 papers with code • 0 benchmarks • 0 datasets
Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.
Benchmarks
These leaderboards are used to track progress in Thompson Sampling
Most implemented papers
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling
At the same time, advances in approximate Bayesian methods have made posterior approximation for flexible neural network models practical.
A Tutorial on Thompson Sampling
Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance.
Adapting multi-armed bandits policies to contextual bandits scenarios
This work explores adaptations of successful multi-armed bandits policies to the online contextual bandits scenario with binary rewards using binary classification algorithms such as logistic regression as black-box oracles.
Randomized Exploration for Non-Stationary Stochastic Linear Bandits
We investigate two perturbation approaches to overcome conservatism that optimism based algorithms chronically suffer from in practice.
Thompson Sampling Algorithms for Mean-Variance Bandits
The multi-armed bandit (MAB) problem is a classical learning task that exemplifies the exploration-exploitation tradeoff.
Meta-Learning Stationary Stochastic Process Prediction with Convolutional Neural Processes
Stationary stochastic processes (SPs) are a key component of many probabilistic models, such as those for off-the-grid spatio-temporal data.
Neural Thompson Sampling
Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-armed bandit problems.
Dynamic Slate Recommendation with Gated Recurrent Units and Thompson Sampling
We introduce a variational Bayesian Recurrent Neural Net recommender system that acts on time series of interactions between the internet platform and the user, and which scales to real world industrial situations.
Thompson Sampling: An Asymptotically Optimal Finite Time Analysis
The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933.
Thompson Sampling for Contextual Bandits with Linear Payoffs
Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems.