We give the first algorithm for kernel Nystr\"om approximation that runs in
*linear time in the number of training points* and is provably accurate for all
kernel matrices, without dependence on regularity or incoherence conditions. The algorithm projects the kernel onto a set of $s$ landmark points sampled by
their *ridge leverage scores*, requiring just $O(ns)$ kernel evaluations and
$O(ns^2)$ additional runtime...
While leverage score sampling has long been known
to give strong theoretical guarantees for Nystr\"om approximation, by employing
a fast recursive sampling scheme, our algorithm is the first to make the
approach scalable. Empirically we show that it finds more accurate, lower rank
kernel approximations in less time than popular techniques such as uniformly
sampled Nystr\"om approximation and the random Fourier features method.