no code implementations • 1 Aug 2022 • Ben London, Levi Lu, Ted Sandler, Thorsten Joachims
We propose the first boosting algorithm for off-policy learning from logged bandit feedback.