Thompson Sampling

96 papers with code • 0 benchmarks • 0 datasets

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Most implemented papers

Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

jkomiyama/multiplaybanditlib 2 Jun 2015

Recently, Thompson sampling (TS), a randomized algorithm with a Bayesian spirit, has attracted much attention for its empirically excellent performance, and it is revealed to have an optimal regret bound in the standard single-play MAB problem.

Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

CoffeeddCat/Multiagent_chainMDP 3 Jul 2015

By parameterizing our learned model with a neural network, we are able to develop a scalable and efficient approach to exploration bonuses that can be applied to tasks with complex, high-dimensional state spaces.

Cascading Bandits for Large-Scale Recommendation Problems

niravnb/Movie-Recommendation-using-Cascading-Bandits 17 Mar 2016

In this work, we study cascading bandits, an online learning variant of the cascade model where the goal is to recommend $K$ most attractive items from a large set of $L$ candidate items.

Double Thompson Sampling for Dueling Bandits

HuasenWu/DuelingBandits NeurIPS 2016

This simple algorithm applies to general Copeland dueling bandits, including Condorcet dueling bandits as its special case.

Stacked Thompson Bandits

jazzbob/stb 28 Feb 2017

We introduce Stacked Thompson Bandits (STB) for efficiently generating plans that are likely to satisfy a given bounded temporal logic requirement.

Mostly Exploration-Free Algorithms for Contextual Bandits

ctrnh/LinearContextualBandits 28 Apr 2017

We prove that this algorithm is rate optimal without any additional assumptions on the context distribution or the number of arms.

AIXIjs: A Software Demo for General Reinforcement Learning

aslanides/aixijs 22 May 2017

The universal Bayesian agent AIXI (Hutter, 2005) is a model of a maximally intelligent agent, and plays a central role in the sub-field of general reinforcement learning (GRL).

Asynchronous Parallel Bayesian Optimisation via Thompson Sampling

kirthevasank/gp-parallel-ts 25 May 2017

We design and analyse variations of the classical Thompson sampling (TS) procedure for Bayesian optimisation (BO) in settings where function evaluations are expensive, but can be performed in parallel.

Variational inference for the multi-armed contextual bandit

iurteaga/bandits 10 Sep 2017

One general class of algorithms for optimizing interactions with the world, while simultaneously learning how the world operates, is the multi-armed bandit setting and, in particular, the contextual bandit case.

Bayesian bandits: balancing the exploration-exploitation tradeoff via double sampling

iurteaga/bandits 10 Sep 2017

Reinforcement learning studies how to balance exploration and exploitation in real-world systems, optimizing interactions with the world while simultaneously learning how the world operates.