Safe Exploration for Optimizing Contextual Bandits

2 Feb 2020Rolf JagermanIlya MarkovMaarten de Rijke

Contextual bandit problems are a natural fit for many information retrieval tasks, such as learning to rank, text classification, recommendation, etc. However, existing learning methods for contextual bandit problems have one of two drawbacks: they either do not explore the space of all possible document rankings (i.e., actions) and, thus, may miss the optimal ranking, or they present suboptimal rankings to a user and, thus, may harm the user experience... (read more)

PDF Abstract

Evaluation Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.