no code implementations • NeurIPS 2010 • Alex Strehl, John Langford, Sham Kakade, Lihong Li
We provide a sound and consistent foundation for the use of \emph{nonrandom} exploration data in "contextual bandit" or "partially labeled" settings where only the value of a chosen action is learned.