no code implementations • 9 May 2022 • Claudia Roberts, Maria Dimakopoulou, Qifeng Qiao, Ashok Chandrashekhar, Tony Jebara
These online learning frameworks learn a treatment assignment policy in the presence of treatment effects that vary with the observed contextual features of the users.
no code implementations • NeurIPS 2021 • Aurélien Bibaut, Antoine Chambaz, Maria Dimakopoulou, Nathan Kallus, Mark van der Laan
Empirical risk minimization (ERM) is the workhorse of machine learning, whether for classification and regression or for off-policy policy learning, but its model-agnostic guarantees can fail when we use adaptively collected data, such as the result of running a contextual bandit algorithm.
no code implementations • NeurIPS 2021 • Aurélien Bibaut, Antoine Chambaz, Maria Dimakopoulou, Nathan Kallus, Mark van der Laan
The adaptive nature of the data collected by contextual bandit algorithms, however, makes this difficult: standard estimators are no longer asymptotically normally distributed and classic confidence intervals fail to provide correct coverage.
no code implementations • NeurIPS 2021 • Maria Dimakopoulou, Zhimei Ren, Zhengyuan Zhou
During online decision making in Multi-Armed Bandits (MAB), one needs to conduct inference on the true mean reward of each arm based on data collected so far at each step.
no code implementations • ICML 2020 • Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, Miroslav Dudík
We propose a new framework for designing estimators for off-policy evaluation in contextual bandits.
no code implementations • 15 Dec 2018 • Maria Dimakopoulou, Zhengyuan Zhou, Susan Athey, Guido Imbens
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult estimation problems along the path of learning.
1 code implementation • NeurIPS 2018 • Maria Dimakopoulou, Ian Osband, Benjamin Van Roy
We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale.
no code implementations • ICML 2018 • Maria Dimakopoulou, Benjamin Van Roy
We consider a team of reinforcement learning agents that concurrently learn to operate in a common environment.
no code implementations • 19 Nov 2017 • Maria Dimakopoulou, Zhengyuan Zhou, Susan Athey, Guido Imbens
We develop parametric and non-parametric contextual bandits that integrate balancing methods from the causal inference literature in their estimation to make it less prone to problems of estimation bias.