Optimal Convergence for Distributed Learning with Stochastic Gradient Methods and Spectral Algorithms

22 Jan 2018  ·  Junhong Lin, Volkan Cevher ·

We study generalization properties of distributed algorithms in the setting of nonparametric regression over a reproducing kernel Hilbert space (RKHS). We first investigate distributed stochastic gradient methods (SGM), with mini-batches and multi-passes over the data. We show that optimal generalization error bounds can be retained for distributed SGM provided that the partition level is not too large. We then extend our results to spectral-regularization algorithms (SRA), including kernel ridge regression (KRR), kernel principal component analysis, and gradient methods. Our results are superior to the state-of-the-art theory. Particularly, our results show that distributed SGM has a smaller theoretical computational complexity, compared with distributed KRR and classic SGM. Moreover, even for non-distributed SRA, they provide the first optimal, capacity-dependent convergence rates, considering the case that the regression function may not be in the RKHS.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here