no code implementations • 22 May 2023 • Yuqian Zhang, Abhishek Chakrabortty, Jelena Bradic
Notably, we relax the need for a positivity condition, commonly required in the missing data literature, and allow uniform decay of labeling propensity scores with sample size, accommodating faster growth of unlabeled data.
no code implementations • 25 Jan 2022 • Abhishek Chakrabortty, Guorong Dai, Raymond J. Carroll
We propose a family of semi-supervised estimators for the response quantile(s) based on the two data sets, to improve the estimation accuracy compared to the supervised estimator, i. e., the sample quantile from the labeled data.
no code implementations • 3 Jan 2022 • Abhishek Chakrabortty, Guorong Dai, Eric Tchetgen Tchetgen
Specifically, we consider two such estimands: (a) the average treatment effect and (b) the quantile treatment effect, as prototype cases, in an SS setting, characterized by two available data sets: (i) a labeled data set of size $n$, providing observations for a response and a set of high dimensional covariates, as well as a binary treatment indicator; and (ii) an unlabeled data set of size $N$, much larger than $n$, but without the response observed.
1 code implementation • 14 Apr 2021 • Yuqian Zhang, Abhishek Chakrabortty, Jelena Bradic
Apart from a moderate-sized labeled data, L, the SS setting is characterized by an additional, much larger sized, unlabeled data, U.
no code implementations • 26 Nov 2019 • Abhishek Chakrabortty, Jiarui Lu, T. Tony Cai, Hongzhe Li
Under mild tail assumptions and arbitrarily chosen (working) models for the propensity score (PS) and the outcome regression (OR) estimators, satisfying only some high-level conditions, we establish finite sample performance bounds for the DDR estimator showing its (optimal) $L_2$ error rate to be $\sqrt{s (\log d)/ n}$ when both models are correct, and its consistency and DR properties when only one of them is correct.
1 code implementation • 27 Sep 2018 • Abhishek Chakrabortty, Preetam Nandy, Hongzhe Li
In particular, we assume that the causal structure of the treatment, the confounders, the potential mediators and the response is a (possibly unknown) directed acyclic graph (DAG).
no code implementations • 8 Apr 2018 • Arun Kumar Kuchibhotla, Abhishek Chakrabortty
The third example concerns the restricted eigenvalue condition, required in HD linear regression, which we verify for all sub-Weibull random vectors through a unified analysis, and also prove a more general result related to restricted strong convexity in the process.
no code implementations • 18 Jan 2017 • Abhishek Chakrabortty, Matey Neykov, Raymond Carroll, Tianxi Cai
We consider the recovery of regression coefficients, denoted by $\boldsymbol{\beta}_0$, for a single index model (SIM) relating a binary outcome $Y$ to a set of possibly high dimensional covariates $\boldsymbol{X}$, based on a large but 'unlabeled' dataset $\mathcal{U}$, with $Y$ never observed.
no code implementations • 17 Jan 2017 • Abhishek Chakrabortty, Tianxi Cai
It is often of interest to investigate if and when the unlabeled data can be exploited to improve estimation of the regression parameter in the adopted linear model.