Best arm identification in multi-armed bandits with delayed feedback

29 Mar 2018Aditya GroverTodor MarkovPeter AttiaNorman JinNicholas PerkinsBryan CheongMichael ChenZi YangStephen HarrisWilliam ChuehStefano Ermon

We propose a generalization of the best arm identification problem in stochastic multi-armed bandits (MAB) to the setting where every pull of an arm is associated with delayed feedback. The delay in feedback increases the effective sample complexity of standard algorithms, but can be offset if we have access to partial feedback received before a pull is completed... (read more)

PDF Abstract

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.