Stochastic Optimization

The Quasi-Hyperbolic Momentum Algorithm (QHM) is a simple alteration of momentum SGD, averaging a plain SGD step with a momentum step. QHAdam is a QH augmented version of Adam, where we replace both of Adam's moment estimators with quasi-hyperbolic terms. QHAdam decouples the momentum term from the current gradient when updating the weights, and decouples the mean squared gradients term from the current squared gradient when updating the weights.

In essence, it is a weighted average of the momentum and plain SGD, weighting the current gradient with an immediate discount factor $v_{1}$ divided by a weighted average of the mean squared gradients and the current squared gradient, weighting the current squared gradient with an immediate discount factor $v_{2}$.

$$ \theta_{t+1, i} = \theta_{t, i} - \eta\left[\frac{\left(1-v_{1}\right)\cdot{g_{t}} + v_{1}\cdot\hat{m}_{t}}{\sqrt{\left(1-v_{2}\right)g^{2}_{t} + v_{2}\cdot{\hat{v}_{t}}} + \epsilon}\right], \forall{t} $$

It is recommended to set $v_{2} = 1$ and $\beta_{2}$ same as in Adam.

Source: Quasi-hyperbolic momentum and Adam for deep learning


Paper Code Results Date Stars


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign