Layer-wise Adaptive Rate Scaling, or LARS, is a large batch optimization technique. There are two notable differences between LARS and other adaptive algorithms such as Adam or RMSProp: first, LARS uses a separate learning rate for each layer and not for each weight. And second, the magnitude of the update is controlled with respect to the weight norm for better control of training speed.
$$m_{t} = \beta_{1}m_{t-1} + \left(1-\beta_{1}\right)\left(g_{t} + \lambda{x_{t}}\right)$$ $$x_{t+1}^{\left(i\right)} = x_{t}^{\left(i\right)} - \eta_{t}\frac{\phi\left(|| x_{t}^{\left(i\right)} ||\right)}{|| m_{t}^{\left(i\right)} || }m_{t}^{\left(i\right)} $$
Source: Large Batch Training of Convolutional NetworksPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Image Classification | 10 | 13.89% |
Self-Supervised Learning | 9 | 12.50% |
Object Detection | 4 | 5.56% |
Semantic Segmentation | 4 | 5.56% |
Semi-Supervised Image Classification | 4 | 5.56% |
Self-Supervised Image Classification | 3 | 4.17% |
Instance Segmentation | 2 | 2.78% |
BIG-bench Machine Learning | 2 | 2.78% |
Question Answering | 2 | 2.78% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |