no code implementations • 24 Dec 2020 • Zedong Tang, Fenlong Jiang, Junke Song, Maoguo Gong, Hao Li, Fan Yu, Zidong Wang, Min Wang
Optimizers that further adjust the scale of gradient, such as Adam, Natural Gradient (NG), etc., despite widely concerned and used by the community, are often found poor generalization performance, compared with Stochastic Gradient Descent (SGD).