AdaSqrt is a stochastic optimization technique that is motivated by the observation that methods like Adagrad and Adam can be viewed as relaxations of Natural Gradient Descent.
The updates are performed as follows:
$$ t \leftarrow t + 1 $$
$$ \alpha_{t} \leftarrow \sqrt{t} $$
$$ g_{t} \leftarrow \nabla_{\theta}f\left(\theta_{t-1}\right) $$
$$ S_{t} \leftarrow S_{t-1} + g_{t}^{2} $$
$$ \theta_{t+1} \leftarrow \theta_{t} + \eta\frac{\alpha_{t}g_{t}}{S_{t} + \epsilon} $$
Source: Second-order Information in First-order Optimization MethodsPaper | Code | Results | Date | Stars |
---|
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |