no code implementations • 21 Jun 2019 • Yuval Lewi, Haim Kaplan, Yishay Mansour
We also bound the regret of those sequences, the worse case sequences have regret $O(\sqrt{T})$ and the best case sequence have regret $O(1)$.
Thompson Sampling