Sign Stable Random Projections for Large-Scale Learning

27 Apr 2015 · Ping Li ·

We study the use of "sign $\alpha$-stable random projections" (where $0<\alpha\leq 2$) for building basic data processing tools in the context of large-scale machine learning applications (e.g., classification, regression, clustering, and near-neighbor search). After the processing by sign stable random projections, the inner products of the processed data approximate various types of nonlinear kernels depending on the value of $\alpha$. Thus, this approach provides an effective strategy for approximating nonlinear learning algorithms essentially at the cost of linear learning. When $\alpha =2$, it is known that the corresponding nonlinear kernel is the arc-cosine kernel. When $\alpha=1$, the procedure approximates the arc-cos-$\chi^2$ kernel (under certain condition). When $\alpha\rightarrow0+$, it corresponds to the resemblance kernel. From practitioners' perspective, the method of sign $\alpha$-stable random projections is ready to be tested for large-scale learning applications, where $\alpha$ can be simply viewed as a tuning parameter. What is missing in the literature is an extensive empirical study to show the effectiveness of sign stable random projections, especially for $\alpha\neq 2$ or 1. The paper supplies such a study on a wide variety of classification datasets. In particular, we compare shoulder-by-shoulder sign stable random projections with the recently proposed "0-bit consistent weighted sampling (CWS)" (Li 2015).

PDF Abstract