1 code implementation • 10 Oct 2023 • Pan Zhao, Antoine Chambaz, Julie Josse, Shu Yang
Policy learning utilizing observational data is pivotal across various domains, with the objective of learning the optimal treatment assignment policy while adhering to specific constraints such as fairness, budget, and simplicity.
no code implementations • 21 Sep 2021 • Ivana Malenica, Rachael V. Phillips, Romain Pirracchio, Antoine Chambaz, Alan Hubbard, Mark J. Van Der Laan
In this work, we introduce the Personalized Online Super Learner (POSL) -- an online ensembling algorithm for streaming data whose optimization procedure accommodates varying degrees of personalization.
1 code implementation • 21 Jul 2021 • Thi Thanh Yen Nguyen, Warith Harchaoui, Lucile Mégret, Cloe Mendoza, Olivier Bouaziz, Christian Neri, Antoine Chambaz
We present several algorithms designed to learn a pattern of correspondence between two data sets in situations where it is desirable to match elements that exhibit a relationship belonging to a known parametric model.
no code implementations • NeurIPS 2021 • Aurélien Bibaut, Antoine Chambaz, Maria Dimakopoulou, Nathan Kallus, Mark van der Laan
Empirical risk minimization (ERM) is the workhorse of machine learning, whether for classification and regression or for off-policy policy learning, but its model-agnostic guarantees can fail when we use adaptively collected data, such as the result of running a contextual bandit algorithm.
no code implementations • NeurIPS 2021 • Aurélien Bibaut, Antoine Chambaz, Maria Dimakopoulou, Nathan Kallus, Mark van der Laan
The adaptive nature of the data collected by contextual bandit algorithms, however, makes this difficult: standard estimators are no longer asymptotically normally distributed and classic confidence intervals fail to provide correct coverage.
no code implementations • 5 Jun 2020 • Aurélien F. Bibaut, Antoine Chambaz, Mark J. Van Der Laan
To the best of our knowledge, our proposal is the first one to be rate-adaptive for a collection of general black-box contextual bandit algorithms: it achieves the same regret rate as the best candidate.
no code implementations • 5 Mar 2020 • Aurélien F. Bibaut, Antoine Chambaz, Mark J. Van Der Laan
We propose the Generalized Policy Elimination (GPE) algorithm, an oracle-efficient contextual bandit (CB) algorithm inspired by the Policy Elimination algorithm of \cite{dudik2011}.
no code implementations • 31 Mar 2018 • Cheng Ju, Antoine Chambaz, Mark J. Van Der Laan
Say that the above product is not fast enough and the algorithm for the $G$-component is fine-tuned by a real-valued $h$.
no code implementations • 30 Jun 2016 • Alexander Luedtke, Emilie Kaufmann, Antoine Chambaz
We study a generalization of the multi-armed bandit problem with multiple plays where there is a cost associated with pulling each arm and the agent has a budget at each time that dictates how much she can expect to spend.