PyTorch DDP

Introduced by Li et al. in PyTorch Distributed: Experiences on Accelerating Data Parallel Training

PyTorch DDP (Distributed Data Parallel) is a distributed data parallel implementation for PyTorch. To guarantee mathematical equivalence, all replicas start from the same initial values for model parameters and synchronize gradients to keep parameters consistent across training iterations. To minimize the intrusiveness, the implementation exposes the same forward API as the user model, allowing applications to seamlessly replace subsequent occurrences of a user model with the distributed data parallel model object with no additional code changes. Several techniques are integrated into the design to deliver high-performance training, including bucketing gradients, overlapping communication with computation, and skipping synchronization.

Source: PyTorch Distributed: Experiences on Accelerating Data Parallel Training

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Handwriting Recognition	1	50.00%
Handwritten Text Recognition	1	50.00%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Distributed Methods

Data Parallel Methods

Replicated Data Parallel