no code implementations • 2 Aug 2022 • Camille-Sovanneary Gauthier, Romaric Gaudel, Elisa Fromont
The semi-bandit version, where a full matching is sampled at each iteration, has been addressed by \cite{ADMA}, creating an algorithm with an expected regret matching $O(\frac{L\log(L)}{\Delta}\log(T))$ with $2L$ players, $T$ iterations and a minimum reward gap $\Delta$.
no code implementations • 28 Sep 2020 • Camille-Sovanneary Gauthier, Romaric Gaudel, Elisa Fromont
Multiple-play bandits aim at displaying relevant items at relevant positions on a web page.