AMSGrad is a stochastic optimization method that seeks to fix a convergence issue with Adam based optimizers. AMSGrad uses the maximum of past squared gradients $v_{t}$ rather than the exponential average to update the parameters:
$$m_{t} = \beta_{1}m_{t-1} + \left(1-\beta_{1}\right)g_{t} $$
$$v_{t} = \beta_{2}v_{t-1} + \left(1-\beta_{2}\right)g_{t}^{2}$$
$$ \hat{v}_{t} = \max\left(\hat{v}_{t-1}, v_{t}\right) $$
$$\theta_{t+1} = \theta_{t} - \frac{\eta}{\sqrt{\hat{v}_{t}} + \epsilon}m_{t}$$
Source: On the Convergence of Adam and BeyondPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Deep Learning | 3 | 9.68% |
Vocal Bursts Type Prediction | 2 | 6.45% |
Quantization | 2 | 6.45% |
Bilevel Optimization | 2 | 6.45% |
Open-Ended Question Answering | 2 | 6.45% |
Time Series Analysis | 2 | 6.45% |
BIG-bench Machine Learning | 2 | 6.45% |
Federated Learning | 1 | 3.23% |
Speech Recognition | 1 | 3.23% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |