QuasiHyperbolic Momentum (QHM) is a stochastic optimization technique that alters momentum SGD with a momentum step, averaging an SGD step with a momentum step:
$$ g_{t+1} = \beta{g_{t}} + \left(1\beta\right)\cdot{\nabla}\hat{L}_{t}\left(\theta_{t}\right) $$ $$ \theta_{t+1} = \theta_{t}  \alpha\left[\left(1v\right)\cdot\nabla\hat{L}_{t}\left(\theta_{t}\right) + v\cdot{g_{t+1}}\right]$$
The authors suggest a rule of thumb of $v = 0.7$ and $\beta = 0.999$.
