AMSGrad

Introduced by Reddi et al. in On the Convergence of Adam and Beyond

AMSGrad is a stochastic optimization method that seeks to fix a convergence issue with Adam based optimizers. AMSGrad uses the maximum of past squared gradients $v_{t}$ rather than the exponential average to update the parameters:

$$m_{t} = \beta_{1}m_{t-1} + \left(1-\beta_{1}\right)g_{t} $$

$$v_{t} = \beta_{2}v_{t-1} + \left(1-\beta_{2}\right)g_{t}^{2}$$

$$ \hat{v}_{t} = \max\left(\hat{v}_{t-1}, v_{t}\right) $$

$$\theta_{t+1} = \theta_{t} - \frac{\eta}{\sqrt{\hat{v}_{t}} + \epsilon}m_{t}$$

Source: On the Convergence of Adam and Beyond

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Vocal Bursts Type Prediction	2	7.41%
Quantization	2	7.41%
Bilevel Optimization	2	7.41%
Open-Ended Question Answering	2	7.41%
Time Series Analysis	2	7.41%
BIG-bench Machine Learning	2	7.41%
Federated Learning	1	3.70%
Speech Recognition	1	3.70%
Portfolio Optimization	1	3.70%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Stochastic Optimization