Most commonly used distributed machine learning systems are either synchronous or centralized asynchronous. Synchronous algorithms like AllReduce-SGD perform poorly in a heterogeneous environment, while asynchronous algorithms using a parameter server suffer from 1) communication bottleneck at parameter servers when workers are many, and 2) significantly worse convergence when the traffic to parameter server is congested... (read more)
PDF Abstract ICML 2018 PDF ICML 2018 AbstractMETHOD | TYPE | |
---|---|---|
![]() |
Stochastic Optimization |