1 code implementation • 12 Jun 2021 • Kun Zeng, Jinlan Liu, Zhixia Jiang, Dongpo Xu
The momentum stochastic gradient descent uses the accumulated gradient as the updated direction of the current parameters, which has a faster training speed.
2 code implementations • 12 Jun 2021 • Kun Zeng, Jinlan Liu, Zhixia Jiang, Dongpo Xu
Adaptive gradient algorithm (AdaGrad) and its variants, such as RMSProp, Adam, AMSGrad, etc, have been widely used in deep learning.