Improved Analysis of the Tsallis-INF Algorithm in Stochastically Constrained Adversarial Bandits and Stochastic Bandits with Adversarial Corruptions

23 Mar 2021  ·  Saeed Masoudian, Yevgeny Seldin ·

We derive improved regret bounds for the Tsallis-INF algorithm of Zimmert and Seldin (2021). We show that in adversarial regimes with a $(\Delta,C,T)$ self-bounding constraint the algorithm achieves $\mathcal{O}\left(\left(\sum_{i\neq i^*} \frac{1}{\Delta_i}\right)\log_+\left(\frac{(K-1)T}{\left(\sum_{i\neq i^*} \frac{1}{\Delta_i}\right)^2}\right)+\sqrt{C\left(\sum_{i\neq i^*}\frac{1}{\Delta_i}\right)\log_+\left(\frac{(K-1)T}{C\sum_{i\neq i^*}\frac{1}{\Delta_i}}\right)}\right)$ regret bound, where $T$ is the time horizon, $K$ is the number of arms, $\Delta_i$ are the suboptimality gaps, $i^*$ is the best arm, $C$ is the corruption magnitude, and $\log_+(x) = \max\left(1,\log x\right)$. The regime includes stochastic bandits, stochastically constrained adversarial bandits, and stochastic bandits with adversarial corruptions as special cases. Additionally, we provide a general analysis, which allows to achieve the same kind of improvement for generalizations of Tsallis-INF to other settings beyond multiarmed bandits.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here