On Robustness of Principal Component Regression

Principal Component Regression (PCR) is a simple, but powerful and ubiquitously utilized method. Its effectiveness is well established when the covariates exhibit low-rank structure. However, its ability to handle settings with noisy, missing, and mixed-valued covariates is not understood and remains an important open challenge. As the main contribution of this work we establish the robustness of PCR in this respect and provide meaningful finite-sample analysis. In the process, we establish that PCR is equivalent to performing Linear Regression after pre-processing the covariate matrix via Hard Singular Value Thresholding (HSVT). That is, PCR is equivalent to the recently proposed robust variant of the Synthetic Control method in the context of counterfactual analysis using observational data. As an immediate consequence, we obtain finite-sample analysis of the Robust Synthetic Control (RSC) estimator that was previously absent. As an important contribution to the Synthetic Control literature, we establish that an (approximate) linear synthetic control exists in the setting of a generalized factor model; traditionally, the existence of a synthetic control needs to be assumed to exist as an axiom. We further discuss a surprising implication of the robustness property of PCR with respect to noise, i.e., PCR can learn a good predictive model even if the covariates are tactfully transformed to preserve differential privacy. Finally, this work advances the state-of-the-art analysis for HSVT by establishing stronger guarantees with respect to the $\ell_{2, \infty}$-norm rather than the Frobenius norm as is commonly done in the matrix estimation literature, which may be of interest in its own right.

PDF Abstract NeurIPS 2019 PDF NeurIPS 2019 Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods