Thompson Sampling

98 papers with code • 0 benchmarks • 0 datasets

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Most implemented papers

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

tensorflow/models ICLR 2018

At the same time, advances in approximate Bayesian methods have made posterior approximation for flexible neural network models practical.

A Tutorial on Thompson Sampling

iosband/ts_tutorial 7 Jul 2017

Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance.

Adapting multi-armed bandits policies to contextual bandits scenarios

david-cortes/contextualbandits 11 Nov 2018

This work explores adaptations of successful multi-armed bandits policies to the online contextual bandits scenario with binary rewards using binary classification algorithms such as logistic regression as black-box oracles.

Randomized Exploration for Non-Stationary Stochastic Linear Bandits

baekjin-kim/NonstationaryLB 11 Dec 2019

We investigate two perturbation approaches to overcome conservatism that optimism based algorithms chronically suffer from in practice.

Thompson Sampling Algorithms for Mean-Variance Bandits

ksetdekov/trip_choice_optimizer ICML 2020

The multi-armed bandit (MAB) problem is a classical learning task that exemplifies the exploration-exploitation tradeoff.

Meta-Learning Stationary Stochastic Process Prediction with Convolutional Neural Processes

YannDubs/Neural-Process-Family NeurIPS 2020

Stationary stochastic processes (SPs) are a key component of many probabilistic models, such as those for off-the-grid spatio-temporal data.

Neural Thompson Sampling

ZeroWeight/NeuralTS ICLR 2021

Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-armed bandit problems.

Dynamic Slate Recommendation with Gated Recurrent Units and Thompson Sampling

finn-no/recsys-slates-dataset 30 Apr 2021

We introduce a variational Bayesian Recurrent Neural Net recommender system that acts on time series of interactions between the internet platform and the user, and which scales to real world industrial situations.

Thompson Sampling: An Asymptotically Optimal Finite Time Analysis

Ralami1859/Stochastic-Multi-Armed-Bandit 18 May 2012

The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933.

Thompson Sampling for Contextual Bandits with Linear Payoffs

yanyangbaobeiIsEmma/Reinforcement-Learning-Contextual-Bandits 15 Sep 2012

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems.