Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers
We introduce a criterion, resilience, which allows properties of a dataset (such as its mean or best low rank approximation) to be robustly computed, even in the presence of a large fraction of arbitrary additional data. Resilience is a weaker condition than most other properties considered so far in the literature, and yet enables robust estimation in a broader variety of settings. We provide new information-theoretic results on robust distribution learning, robust estimation of stochastic block models, and robust mean estimation under bounded $k$th moments. We also provide new algorithmic results on robust distribution learning, as well as robust mean estimation in $\ell_p$-norms. Among our proof techniques is a method for pruning a high-dimensional distribution with bounded $1$st moments to a stable "core" with bounded $2$nd moments, which may be of independent interest.
PDF Abstract