1-bit LAMB is a communication-efficient stochastic optimization technique which introduces a novel way to support adaptive layerwise learning rates even when communication is compressed. Learning from the insights behind 1-bit Adam, it is a a 2-stage algorithm which uses LAMB (warmup stage) to “pre-condition” a communication compressed momentum SGD algorithm (compression stage). At compression stage where original LAMB algorithm cannot be used to update the layerwise learning rates, 1-bit LAMB employs a novel way to adaptively scale layerwise learning rates based on information from both warmup and compression stages. As a result, 1-bit LAMB is able to achieve large batch optimization (LAMB)’s convergence speed under compressed communication.
There are two major differences between 1-bit LAMB and the original LAMB:
Paper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Image Generation | 1 | 10.00% |
Text to image generation | 1 | 10.00% |
Text-to-Image Generation | 1 | 10.00% |
Text-to-Video Generation | 1 | 10.00% |
Video Generation | 1 | 10.00% |
Video Prediction | 1 | 10.00% |
2D Human Pose Estimation | 1 | 10.00% |
2D object detection | 1 | 10.00% |
2D Semantic Segmentation | 1 | 10.00% |