A scalable estimate of the extra-sample prediction error via approximate leave-one-out

30 Jan 2018  ·  Kamiar Rahnama Rad, Arian Maleki ·

The paper considers the problem of out-of-sample risk estimation under the high dimensional settings where standard techniques such as $K$-fold cross validation suffer from large biases. Motivated by the low bias of the leave-one-out cross validation (LO) method, we propose a computationally efficient closed-form approximate leave-one-out formula (ALO) for a large class of regularized estimators. Given the regularized estimate, calculating ALO requires minor computational overhead. With minor assumptions about the data generating process, we obtain a finite-sample upper bound for $|\text{LO} - \text{ALO}|$. Our theoretical analysis illustrates that $|\text{LO} - \text{ALO}| \rightarrow 0$ with overwhelming probability, when $n,p \rightarrow \infty$, where the dimension $p$ of the feature vectors may be comparable with or even greater than the number of observations, $n$. Despite the high-dimensionality of the problem, our theoretical results do not require any sparsity assumption on the vector of regression coefficients. Our extensive numerical experiments show that $|\text{LO} - \text{ALO}|$ decreases as $n,p$ increase, revealing the excellent finite sample performance of ALO. We further illustrate the usefulness of our proposed out-of-sample risk estimation method by an example of real recordings from spatially sensitive neurons (grid cells) in the medial entorhinal cortex of a rat.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper