no code implementations • 1 Jan 2021 • Xunpeng Huang, Vicky Jiaqi Zhang, Hao Zhou, Lei LI
Adaptive gradient methods have been shown to outperform SGD in many tasks of training neural networks.