1 code implementation • 31 Dec 2020 • Tianyi Chen, Ziye Guo, Yuejiao Sun, Wotao Yin
This paper proposes an adaptive stochastic gradient descent method for distributed machine learning, which can be viewed as the communication-adaptive counterpart of the celebrated Adam method - justifying its name CADA.