Calibrated Optimal Decision Making with Multiple Data Sources and Limited Outcome

21 Apr 2021 · Hengrui Cai, Wenbin Lu, Rui Song ·

We consider the optimal decision-making problem in a primary sample of interest with multiple auxiliary sources available. The outcome of interest is limited in the sense that it is only observed in the primary sample. In reality, such multiple data sources may belong to heterogeneous studies and thus cannot be combined directly. This paper proposes a new framework to handle heterogeneous samples and address the limited outcome simultaneously through a novel calibrated optimal decision-making method, by leveraging the common intermediate outcomes in multiple data sources. Specifically, our method allows the baseline covariates across different samples to have either homogeneous or heterogeneous distributions. Under the equal conditional means of intermediate outcomes in different samples given baseline covariates and the treatment information, we show that the proposed estimator of the conditional mean outcome is asymptotically normal and more efficient than using the primary sample solely. Extensive experiments on simulated datasets demonstrate empirical validity and improved efficiency using our approach, followed by a real application to electronic health records.

PDF Abstract