Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)

11 Jun 2020Sharan VaswaniFrederik KunstnerIssam LaradjiSi Yi MengMark SchmidtSimon Lacoste-Julien

As adaptive gradient methods are typically used for training over-parameterized models capable of exactly fitting the data, we study their convergence in this interpolation setting. Under this assumption, we prove that constant step-size, zero-momentum variants of Adam and AMSGrad can converge to the minimizer at the O(1/T) rate for smooth, convex functions... (read more)

PDF Abstract


No code implementations yet. Submit your code now


Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper