Search Results for author: Aurélien F. Bibaut

Found 3 papers, 0 papers with code

Rate-adaptive model selection over a collection of black-box contextual bandit algorithms

no code implementations5 Jun 2020 Aurélien F. Bibaut, Antoine Chambaz, Mark J. Van Der Laan

To the best of our knowledge, our proposal is the first one to be rate-adaptive for a collection of general black-box contextual bandit algorithms: it achieves the same regret rate as the best candidate.

Model Selection

Generalized Policy Elimination: an efficient algorithm for Nonparametric Contextual Bandits

no code implementations5 Mar 2020 Aurélien F. Bibaut, Antoine Chambaz, Mark J. Van Der Laan

We propose the Generalized Policy Elimination (GPE) algorithm, an oracle-efficient contextual bandit (CB) algorithm inspired by the Policy Elimination algorithm of \cite{dudik2011}.

Multi-Armed Bandits

More Efficient Off-Policy Evaluation through Regularized Targeted Learning

no code implementations13 Dec 2019 Aurélien F. Bibaut, Ivana Malenica, Nikos Vlassis, Mark J. Van Der Laan

We study the problem of off-policy evaluation (OPE) in Reinforcement Learning (RL), where the aim is to estimate the performance of a new policy given historical data that may have been generated by a different policy, or policies.

Causal Inference Off-policy evaluation

Cannot find the paper you are looking for? You can Submit a new open access paper.