AdamW

Introduced by Loshchilov et al. in Decoupled Weight Decay Regularization

AdamW is a stochastic optimization method that modifies the typical implementation of weight decay in Adam, by decoupling weight decay from the gradient update. To see this, $L_{2}$ regularization in Adam is usually implemented with the below modification where $w_{t}$ is the rate of the weight decay at time $t$:

$$ g_{t} = \nabla{f\left(\theta_{t}\right)} + w_{t}\theta_{t}$$

while AdamW adjusts the weight decay term to appear in the gradient update:

$$ \theta_{t+1, i} = \theta_{t, i} - \eta\left(\frac{1}{\sqrt{\hat{v}_{t} + \epsilon}}\cdot{\hat{m}_{t}} + w_{t, i}\theta_{t, i}\right), \forall{t}$$

Source: Decoupled Weight Decay Regularization

Latest Papers

PAPER DATE
Longformer for MS MARCO Document Re-ranking Task
| Ivan SekulićAmir SoleimaniMohammad AliannejadiFabio Crestani
2020-09-20
Efficient Transformers: A Survey
Yi TayMostafa DehghaniDara BahriDonald Metzler
2020-09-14
Fine-Tune Longformer for Jointly Predicting Rumor Stance and Veracity
Anant Khandelwal
2020-07-15
Document Classification for COVID-19 Literature
Bernal Jiménez GutiérrezJuncheng ZengDongdong ZhangPing ZhangYu Su
2020-06-15
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
| Zhewei YaoAmir GholamiSheng ShenKurt KeutzerMichael W. Mahoney
2020-06-01
Longformer: The Long-Document Transformer
| Iz BeltagyMatthew E. PetersArman Cohan
2020-04-10
Automated Pavement Crack Segmentation Using U-Net-based Convolutional Neural Network
Stephen L. H. LauEdwin K. P. ChongXu YangXin Wang
2020-01-07
Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks
| Boris GinsburgPatrice CastonguayOleksii HrinchukOleksii KuchaievVitaly LavrukhinRyan LearyJason LiHuyen NguyenYang ZhangJonathan M. Cohen
2019-05-27
A unified theory of adaptive stochastic gradient descent as Bayesian filtering
Laurence Aitchison
2019-05-01
Bayesian filtering unifies adaptive and non-adaptive neural network optimization methods
Laurence Aitchison
2018-07-19
Decoupled Weight Decay Regularization
| Ilya LoshchilovFrank Hutter
2017-11-14

Components

COMPONENT TYPE
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories