1-bit LAMB

Introduced by Li et al. in 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed

1-bit LAMB is a communication-efficient stochastic optimization technique which introduces a novel way to support adaptive layerwise learning rates even when communication is compressed. Learning from the insights behind 1-bit Adam, it is a a 2-stage algorithm which uses LAMB (warmup stage) to “pre-condition” a communication compressed momentum SGD algorithm (compression stage). At compression stage where original LAMB algorithm cannot be used to update the layerwise learning rates, 1-bit LAMB employs a novel way to adaptively scale layerwise learning rates based on information from both warmup and compression stages. As a result, 1-bit LAMB is able to achieve large batch optimization (LAMB)’s convergence speed under compressed communication.

There are two major differences between 1-bit LAMB and the original LAMB:

During compression stage, 1-bit LAMB updates the layerwise learning rate based on a novel “reconstructed gradient” based on the compressed momentum. This makes 1-bit LAMB compatible with error compensation and be able to keep track of the training dynamic under compression.
1-bit LAMB also introduces extra stabilized soft thresholds when updating layerwise learning rate at compression stage, which makes training more stable under compression.

Source: 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Ensemble Learning	1	5.00%
Specificity	1	5.00%
In-Context Learning	1	5.00%
Language Modelling	1	5.00%
Speech Synthesis	1	5.00%
Text-To-Speech Synthesis	1	5.00%
Computational Efficiency	1	5.00%
Speech Enhancement	1	5.00%
Image Generation	1	5.00%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
LAMB	Large Batch Optimization

Categories

Add Remove

Stochastic Optimization

Large Batch Optimization