Thompson Sampling

99 papers with code • 0 benchmarks • 0 datasets

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Benchmarks

Add a Result

These leaderboards are used to track progress in Thompson Sampling

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Most implemented papers

Most implemented Social Latest No code

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

tensorflow/models • • ICLR 2018

At the same time, advances in approximate Bayesian methods have made posterior approximation for flexible neural network models practical.

Paper
Code

A Tutorial on Thompson Sampling

iosband/ts_tutorial • 7 Jul 2017

Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance.

Paper
Code

Adapting multi-armed bandits policies to contextual bandits scenarios

david-cortes/contextualbandits • 11 Nov 2018

This work explores adaptations of successful multi-armed bandits policies to the online contextual bandits scenario with binary rewards using binary classification algorithms such as logistic regression as black-box oracles.

Paper
Code

Randomized Exploration for Non-Stationary Stochastic Linear Bandits

baekjin-kim/NonstationaryLB • 11 Dec 2019

We investigate two perturbation approaches to overcome conservatism that optimism based algorithms chronically suffer from in practice.

Paper
Code

Thompson Sampling Algorithms for Mean-Variance Bandits

ksetdekov/trip_choice_optimizer • ICML 2020

The multi-armed bandit (MAB) problem is a classical learning task that exemplifies the exploration-exploitation tradeoff.

Paper
Code

Meta-Learning Stationary Stochastic Process Prediction with Convolutional Neural Processes

YannDubs/Neural-Process-Family • • NeurIPS 2020

Stationary stochastic processes (SPs) are a key component of many probabilistic models, such as those for off-the-grid spatio-temporal data.

Paper
Code

Neural Thompson Sampling

ZeroWeight/NeuralTS • • ICLR 2021

Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-armed bandit problems.

Paper
Code

Dynamic Slate Recommendation with Gated Recurrent Units and Thompson Sampling

finn-no/recsys-slates-dataset • • 30 Apr 2021

We introduce a variational Bayesian Recurrent Neural Net recommender system that acts on time series of interactions between the internet platform and the user, and which scales to real world industrial situations.

Paper
Code

Thompson Sampling: An Asymptotically Optimal Finite Time Analysis

Ralami1859/Stochastic-Multi-Armed-Bandit • 18 May 2012

The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933.

Paper
Code

Thompson Sampling for Contextual Bandits with Linear Payoffs

yanyangbaobeiIsEmma/Reinforcement-Learning-Contextual-Bandits • 15 Sep 2012

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems.

Paper
Code

Thompson Sampling

Benchmarks Add a Result

Most implemented papers

Content

Benchmarks

Add a Result