no code implementations • 24 Oct 2022 • Mengzhe Ruan, Guangfeng Yan, Yuanzhang Xiao, Linqi Song, Weitao Xu
This paper proposes a novel adaptive Top-K in SGD framework that enables an adaptive degree of sparsification for each gradient descent step to optimize the convergence performance by balancing the trade-off between communication cost and convergence error.