AMSGrad is a stochastic optimization method that seeks to fix a convergence issue with Adam based optimizers. AMSGrad uses the maximum of past squared gradients $v_{t}$ rather than the exponential average to update the parameters:
$$m_{t} = \beta_{1}m_{t1} + \left(1\beta_{1}\right)g_{t} $$
$$v_{t} = \beta_{2}v_{t1} + \left(1\beta_{2}\right)g_{t}^{2}$$
$$ \hat{v}_{t} = \max\left(\hat{v}_{t1}, v_{t}\right) $$
$$\theta_{t+1} = \theta_{t}  \frac{\eta}{\sqrt{\hat{v}_{t}} + \epsilon}m_{t}$$
Source: On the Convergence of Adam and BeyondPaper  Code  Results  Date  Stars 

Task  Papers  Share 

Deep Learning  3  9.68% 
Vocal Bursts Type Prediction  2  6.45% 
Quantization  2  6.45% 
Bilevel Optimization  2  6.45% 
OpenEnded Question Answering  2  6.45% 
Time Series Analysis  2  6.45% 
BIGbench Machine Learning  2  6.45% 
Federated Learning  1  3.23% 
Speech Recognition  1  3.23% 
Component  Type 


🤖 No Components Found  You can add them if they exist; e.g. Mask RCNN uses RoIAlign 