Adam is an adaptive learning rate optimization algorithm that utilises both momentum and scaling, combining the benefits of RMSProp and SGD w/th Momentum. The optimizer is designed to be appropriate for nonstationary objectives and problems with very noisy and/or sparse gradients.
The weight updates are performed as:
$$ w_{t} = w_{t1}  \eta\frac{\hat{m}_{t}}{\sqrt{\hat{v}_{t}} + \epsilon} $$
with
$$ \hat{m}_{t} = \frac{m_{t}}{1\beta^{t}_{1}} $$
$$ \hat{v}_{t} = \frac{v_{t}}{1\beta^{t}_{2}} $$
$$ m_{t} = \beta_{1}m_{t1} + (1\beta_{1})g_{t} $$
$$ v_{t} = \beta_{2}v_{t1} + (1\beta_{2})g_{t}^{2} $$
$ \eta $ is the step size/learning rate, around 1e3 in the original paper. $ \epsilon $ is a small number, typically 1e8 or 1e10, to prevent dividing by zero. $ \beta_{1} $ and $ \beta_{2} $ are forgetting parameters, with typical values 0.9 and 0.999, respectively.
Source: Adam: A Method for Stochastic OptimizationPaper  Code  Results  Date  Stars 

Task  Papers  Share 

Language Modelling  58  7.59% 
Retrieval  35  4.58% 
Question Answering  30  3.93% 
Large Language Model  26  3.40% 
Semantic Segmentation  21  2.75% 
InContext Learning  16  2.09% 
Object Detection  14  1.83% 
Image Classification  12  1.57% 
Sentence  11  1.44% 
Component  Type 


🤖 No Components Found  You can add them if they exist; e.g. Mask RCNN uses RoIAlign 