no code implementations • 30 Jun 2020 • Seungki Min, Ciamac C. Moallemi, Daniel J. Russo
We study the use of policy gradient algorithms to optimize over a class of generalized Thompson sampling policies.
Policy Gradient Methods Thompson Sampling