Online EXP3 Learning in Adversarial Bandits with Delayed Feedback

NeurIPS 2019 · Ilai Bistritz, Zhengyuan Zhou, Xi Chen, Nicholas Bambos, Jose Blanchet ·

Consider a player that in each of T rounds chooses one of K arms. An adversary chooses the cost of each arm in a bounded interval, and a sequence of feedback delays \left\{ d_{t}\right\} that are unknown to the player. After picking arm a_{t} at round t, the player receives the cost of playing this arm d_{t} rounds later. In cases where t+d_{t}>T, this feedback is simply missing. We prove that the EXP3 algorithm (that uses the delayed feedback upon its arrival) achieves a regret of O\left(\sqrt{\ln K\left(KT+\sum_{t=1}^{T}d_{t}\right)}\right). For the case where \sum_{t=1}^{T}d_{t} and T are unknown, we propose a novel doubling trick for online learning with delays and prove that this adaptive EXP3 achieves a regret of O\left(\sqrt{\ln K\left(K^{2}T+\sum_{t=1}^{T}d_{t}\right)}\right). We then consider a two player zero-sum game where players experience asynchronous delays. We show that even when the delays are large enough such that players no longer enjoy the “no-regret property”, (e.g., where d_{t}=O\left(t\log t\right)) the ergodic average of the strategy profile still converges to the set of Nash equilibria of the game. The result is made possible by choosing an adaptive step size \eta_{t} that is not summable but is square summable, and proving a “weighted regret bound” for this general case.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Online EXP3 Learning in Adversarial Bandits with Delayed Feedback

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove