Bootstrapped Thompson Sampling and Deep Exploration

1 Jul 2015  ·  Ian Osband, Benjamin Van Roy ·

This technical note presents a new approach to carrying out the kind of exploration achieved by Thompson sampling, but without explicitly maintaining or sampling from posterior distributions. The approach is based on a bootstrap technique that uses a combination of observed and artificially generated data. The latter serves to induce a prior distribution which, as we will demonstrate, is critical to effective exploration. We explain how the approach can be applied to multi-armed bandit and reinforcement learning problems and how it relates to Thompson sampling. The approach is particularly well-suited for contexts in which exploration is coupled with deep learning, since in these settings, maintaining or generating samples from a posterior distribution becomes computationally infeasible.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here