Anytime Stochastic Gradient Descent: A Time to Hear from all the Workers

In this paper, we focus on approaches to parallelizing stochastic gradient descent (SGD) wherein data is farmed out to a set of workers, the results of which, after a number of updates, are then combined at a central master node. Although such synchronized SGD approaches parallelize well in idealized computing environments, they often fail to realize their promised computational acceleration in practical settings... (read more)

Results in Papers With Code
(↓ scroll down to see all results)