Parallel SGD: When does averaging help?

Consider a number of workers running SGD independently on the same pool of data and averaging the models every once in a while -- a common but not well understood practice. We study model averaging as a variance-reducing mechanism and describe two ways in which the frequency of averaging affects convergence... (read more)

Results in Papers With Code
(↓ scroll down to see all results)