ADA+: A GENERIC FRAMEWORK WITH MORE ADAPTIVE EXPLICIT ADJUSTMENT FOR LEARNING RATE
Although adaptive algorithms have achieved significant success in training deep neural networks with faster training speed, they tend to have poor generalization performance compared to SGD with Momentum(SGDM). One of the state-of-the-art algorithms, PADAM, is proposed to close the generalization gap of adaptive methods while lacking an internal explanation. This work pro- poses a general framework, in which we use an explicit function Φ(·) as an adjustment to the actual step size, and present a more adaptive specific form AdaPlus(Ada+). Based on this framework, we analyze various behaviors brought by different types of Φ(·), such as a constant function in SGDM, a linear function in Adam, a concave function in Padam and a concave function with offset term in AdaPlus. Empirically, we conduct experiments on classic benchmarks both in CNN and RNN architectures and achieve better performance(even than SGDM).
PDF Abstract