no code implementations • 5 Jun 2020 • Aurélien F. Bibaut, Antoine Chambaz, Mark J. Van Der Laan
To the best of our knowledge, our proposal is the first one to be rate-adaptive for a collection of general black-box contextual bandit algorithms: it achieves the same regret rate as the best candidate.
no code implementations • 5 Mar 2020 • Aurélien F. Bibaut, Antoine Chambaz, Mark J. Van Der Laan
We propose the Generalized Policy Elimination (GPE) algorithm, an oracle-efficient contextual bandit (CB) algorithm inspired by the Policy Elimination algorithm of \cite{dudik2011}.
no code implementations • 13 Dec 2019 • Aurélien F. Bibaut, Ivana Malenica, Nikos Vlassis, Mark J. Van Der Laan
We study the problem of off-policy evaluation (OPE) in Reinforcement Learning (RL), where the aim is to estimate the performance of a new policy given historical data that may have been generated by a different policy, or policies.