Chimera

Introduced by Li et al. in Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines

Chimera is a pipeline model parallelism scheme which combines bidirectional pipelines for efficiently training large-scale models. The key idea of Chimera is to combine two pipelines in different directions (down and up pipelines).

Denote $N$ as the number of micro-batches executed by each worker within a training iteration, and $D$ the number of pipeline stages (depth), and $P$ the number of workers.

The Figure shows an example with four pipeline stages (i.e. $D=4$). Here we assume there are $D$ micro-batches executed by each worker within a training iteration, namely $N=D$, which is the minimum to keep all the stages active.

In the down pipeline, stage$_{0}$∼stage$_{3}$ are mapped to $P_{0}∼P_{3}$ linearly, while in the up pipeline the stages are mapped in a completely opposite order. The $N$ (assuming an even number) micro-batches are equally partitioned among the two pipelines. Each pipeline schedules $N/2$ micro-batches using 1F1B strategy, as shown in the left part of the Figure. Then, by merging these two pipelines together, we obtain the pipeline schedule of Chimera. Given an even number of stages $D$ (which can be easily satisfied in practice), it is guaranteed that there is no conflict (i.e., there is at most one micro-batch occupies the same time slot on each worker) during merging.

Source: Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Active Learning	1	50.00%
Thompson Sampling	1	50.00%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Model Parallel Methods

Synchronous Pipeline Parallel

Distributed Methods