# On the Performance of Thompson Sampling on Logistic Bandits

12 May 2019Shi DongTengyu MaBenjamin Van Roy

We study the logistic bandit, in which rewards are binary with success probability $\exp(\beta a^\top \theta) / (1 + \exp(\beta a^\top \theta))$ and actions $a$ and coefficients $\theta$ are within the $d$-dimensional unit ball. While prior regret bounds for algorithms that address the logistic bandit exhibit exponential dependence on the slope parameter $\beta$, we establish a regret bound for Thompson sampling that is independent of $\beta$... (read more)

PDF Abstract

# Code Add Remove

No code implementations yet. Submit your code now