no code implementations • 28 Apr 2021 • Ali Ramezani-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan Alistarh, Daniel M. Roy
As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training.
no code implementations • NeurIPS 2020 • Vitalii Aksenov, Dan Alistarh, Janne H. Korhonen
The ability to leverage large-scale hardware parallelism has been one of the key enablers of the accelerated recent progress in machine learning.
no code implementations • 25 Sep 2019 • Ali Ramezani-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan Alistarh, Daniel M. Roy
As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed on clusters to perform model fitting in parallel.