no code implementations • ICML 2018 • Chao Tao, Saúl Blanco, Yuan Zhou
We study the best arm identification problem in linear bandits, where the mean reward of each arm depends linearly on an unknown $d$-dimensional parameter vector $\theta$, and the goal is to identify the arm with the largest expected reward.