PowerSGD: Powered Stochastic Gradient Descent Methods for Accelerated Non-Convex Optimization

25 Sep 2019  ·  Jun Liu, Beitong Zhou, Weigao Sun, Ruijuan Chen, Claire J. Tomlin, Ye Yuan ·

In this paper, we propose a novel technique for improving the stochastic gradient descent (SGD) method to train deep networks, which we term \emph{PowerSGD}. The proposed PowerSGD method simply raises the stochastic gradient to a certain power $\gamma\in[0,1]$ during iterations and introduces only one additional parameter, namely, the power exponent $\gamma$ (when $\gamma=1$, PowerSGD reduces to SGD). We further propose PowerSGD with momentum, which we term \emph{PowerSGDM}, and provide convergence rate analysis on both PowerSGD and PowerSGDM methods. Experiments are conducted on popular deep learning models and benchmark datasets. Empirical results show that the proposed PowerSGD and PowerSGDM obtain faster initial training speed than adaptive gradient methods, comparable generalization ability with SGD, and improved robustness to hyper-parameter selection and vanishing gradients. PowerSGD is essentially a gradient modifier via a nonlinear transformation. As such, it is orthogonal and complementary to other techniques for accelerating gradient-based optimization.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here