TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Multi-Armed Bandits	Mushroom	Linear FullPosterior-MR	Cumulative regret	1.82	# 1
Multi-Armed Bandits	Mushroom	NeuralLinear FullPosterior-MR	Cumulative regret	1.92	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-bayesian-bandits-showdown-an-empirical/multi-armed-bandits-on-mushroom)](https://paperswithcode.com/sota/multi-armed-bandits-on-mushroom?p=deep-bayesian-bandits-showdown-an-empirical)`

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

ICLR 2018 · Carlos Riquelme, George Tucker, Jasper Snoek ·

Recent advances in deep reinforcement learning have made significant strides in performance on applications such as Go and Atari games. However, developing practical methods to balance exploration and exploitation in complex domains remains largely unsolved. Thompson Sampling and its extension to reinforcement learning provide an elegant approach to exploration that only requires access to posterior samples of the model. At the same time, advances in approximate Bayesian methods have made posterior approximation for flexible neural network models practical. Thus, it is attractive to consider approximate Bayesian neural networks in a Thompson Sampling framework. To understand the impact of using an approximate posterior on Thompson Sampling, we benchmark well-established and recently developed methods for approximate posterior sampling combined with Thompson Sampling over a series of contextual bandit problems. We found that many approaches that have been successful in the supervised learning setting underperformed in the sequential decision-making scenario. In particular, we highlight the challenge of adapting slowly converging uncertainty estimates to the online setting.

PDF Abstract ICLR 2018 PDF ICLR 2018 Abstract

Code

Add Remove Mark official

tensorflow/models

76,588

tensorflow/models

76,587

mlisicki/neuralkernelbandits

vectorinstitute/neuralkernelbandits

Tasks

Add Remove

Decision Making

Multi-Armed Bandits

reinforcement-learning

Reinforcement Learning (RL)

Thompson Sampling

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Edit

Ranked #1 on Multi-Armed Bandits on Mushroom

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Multi-Armed Bandits	Mushroom	Linear FullPosterior-MR	Cumulative regret	1.82	# 1		Compare
Multi-Armed Bandits	Mushroom	NeuralLinear FullPosterior-MR	Cumulative regret	1.92	# 2		Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove