# Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits

19 Jul 2018Julian ZimmertYevgeny Seldin

We derive an algorithm that achieves the optimal (within constants) pseudo-regret in both adversarial and stochastic multi-armed bandits without prior knowledge of the regime and time horizon. The algorithm is based on online mirror descent (OMD) with Tsallis entropy regularization with power $\alpha=1/2$ and reduced-variance loss estimators... (read more)

PDF Abstract