Decaying momentum helps neural network training

ICLR 2020 John ChenCameron WolfeZhao LiAnastasios Kyrillidis

Momentum is a simple and popular technique in deep learning for gradient-based optimizers. We propose a decaying momentum (Demon) rule, motivated by decaying the total contribution of a gradient to all future updates... (read more)

PDF Abstract

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper