Fully adaptive algorithm for pure exploration in linear bandits

16 Oct 2017  ·  Liyuan Xu, Junya Honda, Masashi Sugiyama ·

We propose the first fully-adaptive algorithm for pure exploration in linear bandits---the task to find the arm with the largest expected reward, which depends on an unknown parameter linearly. While existing methods partially or entirely fix sequences of arm selections before observing rewards, our method adaptively changes the arm selection strategy based on past observations at each round. We show our sample complexity matches the achievable lower bound up to a constant factor in an extreme case. Furthermore, we evaluate the performance of the methods by simulations based on both synthetic setting and real-world data, in which our method shows vast improvement over existing methods.

PDF Abstract
No code implementations yet. Submit your code now



  Add Datasets introduced or used in this paper

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.


No methods listed for this paper. Add relevant methods here