Distributed Methods

Zero Redundancy Optimizer (ZeRO) is a sharded data parallel method for distributed training. ZeRODP removes the memory state redundancies across data-parallel processes by partitioning the model states instead of replicating them, and it retains the compute/communication efficiency by retaining the computational granularity and communication volume of DP using a dynamic communication schedule during training.

Source: ZeRO: Memory Optimizations Toward Training Trillion Parameter Models


Paper Code Results Date Stars


Task Papers Share
Quantization 2 28.57%
Language Modelling 2 28.57%
Large Language Model 1 14.29%
Cross-Lingual Document Classification 1 14.29%
Image Generation 1 14.29%


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign
