On Distributed Multi-player Multiarmed Bandit Problems in Abruptly Changing Environment

12 Dec 2018  ·  Lai Wei, Vaibhav Srivastava ·

We study the multi-player stochastic multiarmed bandit (MAB) problem in an abruptly changing environment. We consider a collision model in which a player receives reward at an arm if it is the only player to select the arm. We design two novel algorithms, namely, Round-Robin Sliding-Window Upper Confidence Bound\# (RR-SW-UCB\#), and the Sliding-Window Distributed Learning with Prioritization (SW-DLP). We rigorously analyze these algorithms and show that the expected cumulative group regret for these algorithms is upper bounded by sublinear functions of time, i.e., the time average of the regret asymptotically converges to zero. We complement our analytic results with numerical illustrations.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here