Distributed Delayed Stochastic Optimization

NeurIPS 2011  ·  Alekh Agarwal, John C. Duchi ·

We analyze the convergence of gradient-based optimization algorithms whose updates depend on delayed stochastic gradient information. The main application of our results is to the development of distributed minimization algorithms where a master node performs parameter updates while worker nodes compute stochastic gradients based on local information in parallel, which may give rise to delays due to asynchrony. Our main contribution is to show that for smooth stochastic problems, the delays are asymptotically negligible. In application to distributed optimization, we show $n$-node architectures whose optimization error in stochastic problems---in spite of asynchronous delays---scales asymptotically as $\order(1 / \sqrt{nT})$, which is known to be optimal even in the absence of delays.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here