A new look at fairness in stochastic multi-armed bandit problems

29 Sep 2021  ·  Guanhua Fang, Ping Li, Gennady Samorodnitsky ·

We study an important variant of the stochastic multi-armed bandit (MAB) problem, which takes fairness into consideration. Instead of directly maximizing cumulative expected reward, we need to balance between the total reward and fairness level. In this paper, we present a new insight in MAB with fairness and formulate the problem in the penalization framework, where rigorous penalized regret can be well defined and more sophisticated regret analysis is possible. Under such a framework, we propose a hard-threshold UCB-like algorithm, which enjoys many merits including asymptotic fairness, nearly optimal regret, better tradeoff between reward and fairness. Both gap-dependent and gap-independent upper bounds have been established. Lower bounds are also given to illustrate the tightness of our theoretical analysis. Numerous experimental results corroborate the theory and show the superiority of our method over other existing methods.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here