no code implementations • 9 Oct 2018 • Rémy Degenne, Thomas Nedelec, Clément Calauzènes, Vianney Perchet
State of the art online learning procedures focus either on selecting the best alternative ("best arm identification") or on minimizing the cost (the "regret").
no code implementations • 22 Jan 2018 • Alexandre Gilotte, Clément Calauzènes, Thomas Nedelec, Alexandre Abraham, Simon Dollé
Before A/B testing online a new version of a recommender system, it is usual to perform some offline evaluations on historical data.
no code implementations • 3 Apr 2017 • Thomas Nedelec, Nicolas Le Roux, Vianney Perchet
We provide a comparative study of several widely used off-policy estimators (Empirical Average, Basic Importance Sampling and Normalized Importance Sampling), detailing the different regimes where they are individually suboptimal.