Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)

11 Jun 2020Sharan VaswaniFrederik KunstnerIssam LaradjiSi Yi MengMark SchmidtSimon Lacoste-Julien

As adaptive gradient methods are typically used for training over-parameterized models capable of exactly fitting the data, we study their convergence in this interpolation setting. Under this assumption, we prove that constant step-size, zero-momentum variants of Adam and AMSGrad can converge to the minimizer at the O(1/T) rate for smooth, convex functions... (read more)

PDF Abstract

Code


No code implementations yet. Submit your code now

Tasks


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper