no code implementations • 19 Sep 2020 • Negar Foroutan Eghlidi, Martin Jaggi
Although distributed training reduces the computation time, the communication overhead associated with the gradient exchange forms a scalability bottleneck for the algorithm.