no code implementations • NeurIPS 2020 • Arpit Agarwal, Nicholas Johnson, Shivani Agarwal
Here we study a natural generalization, that we term \emph{choice bandits}, where the learner plays a set of up to $k \geq 2$ arms and receives limited relative feedback in the form of a single multiway choice among the pulled arms, drawn from an underlying multiway choice model.
no code implementations • 17 Jun 2016 • Nicholas Johnson, Vidyashankar Sivakumar, Arindam Banerjee
The goal in such a problem is to minimize the (pseudo) regret which is the difference between the total expected loss of the algorithm and the total expected loss of the best fixed vector in hindsight.