A Qualitative Study of the Dynamic Behavior of Adaptive Gradient Algorithms

14 Sep 2020  ·  Chao Ma, Lei Wu, Weinan E ·

The dynamic behavior of RMSprop and Adam algorithms is studied through a combination of careful numerical experiments and theoretical explanations. Three types of qualitative features are observed in the training loss curve: fast initial convergence, oscillations and large spikes... The sign gradient descent (signGD) algorithm, which is the limit of Adam when taking the learning rate to $0$ while keeping the momentum parameters fixed, is used to explain the fast initial convergence. For the late phase of Adam, three different types of qualitative patterns are observed depending on the choice of the hyper-parameters: oscillations, spikes and divergence. In particular, Adam converges faster and smoother when the values of the two momentum factors are close to each other. read more

PDF Abstract
No code implementations yet. Submit your code now



  Add Datasets introduced or used in this paper

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.