Methods > General > Large Batch Optimization

LARS

Introduced by You et al. in Large Batch Training of Convolutional Networks

Layer-wise Adaptive Rate Scaling, or LARS, is a large batch optimization technique. There are two notable differences between LARS and other adaptive algorithms such as Adam or RMSProp: first, LARS uses a separate learning rate for each layer and not for each weight. And second, the magnitude of the update is controlled with respect to the weight norm for better control of training speed.

$$m_{t} = \beta_{1}m_{t-1} + \left(1-\beta_{1}\right)\left(g_{t} + \lambda{x_{t}}\right)$$ $$x_{t+1}^{\left(i\right)} = x_{t}^{\left(i\right)} - \eta_{t}\frac{\phi\left(|| x_{t}^{\left(i\right)} ||\right)}{|| m_{t}^{\left(i\right)} || }m_{t}^{\left(i\right)} $$

Source: Large Batch Training of Convolutional Networks

Latest Papers

PAPER DATE
Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples
| Mahmoud AssranMathilde CaronIshan MisraPiotr BojanowskiArmand JoulinNicolas BallasMichael Rabbat
2021-04-28
Self-Supervised Training Enhances Online Continual Learning
Jhair GallardoTyler L. HayesChristopher Kanan
2021-03-25
Self-supervised Pretraining of Visual Features in the Wild
| Priya GoyalMathilde CaronBenjamin LefaudeuxMin XuPengchao WangVivek PaiMannat SinghVitaliy LiptchinskyIshan MisraArmand JoulinPiotr Bojanowski
2021-03-02
A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes
Zachary NadoJustin M. GilmerChristopher J. ShallueRohan AnilGeorge E. Dahl
2021-02-12
Evaluating Deep Learning in SystemML using Layer-wise Adaptive Rate Scaling(LARS) Optimizer
Kanchan ChowdhuryAnkita SharmaArun Deepak Chandrasekar
2021-02-05
Fast Training of Contrastive Learning with Intermediate Contrastive Loss
Anonymous
2021-01-01
Study on the Large Batch Size Training of Neural Networks Based on the Second Order Gradient
Fengli GaoHuicai Zhong
2020-12-16
Improving Layer-wise Adaptive Rate Methods using Trust Ratio Clipping
Jeffrey FongSiwei ChenKaiqi Chen
2020-11-27
Self-Supervised Ranking for Representation Learning
Ali VarameshAli DibaTinne TuytelaarsLuc van Gool
2020-10-14
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
| Mathilde CaronIshan MisraJulien MairalPriya GoyalPiotr BojanowskiArmand Joulin
2020-06-17
Supervised Contrastive Learning
| Prannay KhoslaPiotr TeterwakChen WangAaron SarnaYonglong TianPhillip IsolaAaron MaschinotCe LiuDilip Krishnan
2020-04-23
A Simple Framework for Contrastive Learning of Visual Representations
| Ting ChenSimon KornblithMohammad NorouziGeoffrey Hinton
2020-02-13
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
| Yang YouJing LiSashank ReddiJonathan HseuSanjiv KumarSrinadh BhojanapalliXiaodan SongJames DemmelKurt KeutzerCho-Jui Hsieh
2019-04-01
SNAP: A semismooth Newton algorithm for pathwise optimization with optimal local convergence rate and oracle properties
Jian HuangYuling JiaoXiliang LuYueyong ShiQinglong Yang
2018-10-09
Convergence Analysis of Gradient Descent Algorithms with Proportional Updates
Igor GitmanDeepak DilipkumarBen Parr
2018-01-09
Large Batch Training of Convolutional Networks with Layer-wise Adaptive Rate Scaling
Boris GinsburgIgor GitmanYang You
2018-01-01
ImageNet Training in Minutes
| Yang YouZhao ZhangCho-Jui HsiehJames DemmelKurt Keutzer
2017-09-14
Large Batch Training of Convolutional Networks
| Yang YouIgor GitmanBoris Ginsburg
2017-08-13

Components

COMPONENT TYPE
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories