TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Fine-Grained Image Classification	Birdsnap	GPIPE	Accuracy	83.6%	# 4
Image Classification	CIFAR-10	GPIPE + transfer learning	Percentage correct	99	# 19
Image Classification	CIFAR-100	GPIPE	Percentage correct	91.3	# 21
Image Classification	ImageNet	GPIPE	Top 1 Accuracy	84.4%	# 299
Fine-Grained Image Classification	Stanford Cars	GPipe	Accuracy	94.6%	# 33

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/gpipe-efficient-training-of-giant-neural/fine-grained-image-classification-on-birdsnap)](https://paperswithcode.com/sota/fine-grained-image-classification-on-birdsnap?p=gpipe-efficient-training-of-giant-neural)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/gpipe-efficient-training-of-giant-neural/image-classification-on-cifar-10)](https://paperswithcode.com/sota/image-classification-on-cifar-10?p=gpipe-efficient-training-of-giant-neural)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/gpipe-efficient-training-of-giant-neural/image-classification-on-cifar-100)](https://paperswithcode.com/sota/image-classification-on-cifar-100?p=gpipe-efficient-training-of-giant-neural)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/gpipe-efficient-training-of-giant-neural/fine-grained-image-classification-on-stanford)](https://paperswithcode.com/sota/fine-grained-image-classification-on-stanford?p=gpipe-efficient-training-of-giant-neural)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/gpipe-efficient-training-of-giant-neural/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=gpipe-efficient-training-of-giant-neural)`

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

NeurIPS 2019 · Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Mia Xu Chen, Dehao Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, Zhifeng Chen ·

Scaling up deep neural network capacity has been known as an effective approach to improving model quality for several different machine learning tasks. In many cases, increasing model capacity beyond the memory limit of a single accelerator has required developing special algorithms or infrastructure. These solutions are often architecture-specific and do not transfer to other tasks. To address the need for efficient and task-independent model parallelism, we introduce GPipe, a pipeline parallelism library that allows scaling any network that can be expressed as a sequence of layers. By pipelining different sub-sequences of layers on separate accelerators, GPipe provides the flexibility of scaling a variety of different networks to gigantic sizes efficiently. Moreover, GPipe utilizes a novel batch-splitting pipelining algorithm, resulting in almost linear speedup when a model is partitioned across multiple accelerators. We demonstrate the advantages of GPipe by training large-scale neural networks on two different tasks with distinct network architectures: (i) Image Classification: We train a 557-million-parameter AmoebaNet model and attain a top-1 accuracy of 84.4% on ImageNet-2012, (ii) Multilingual Neural Machine Translation: We train a single 6-billion-parameter, 128-layer Transformer model on a corpus spanning over 100 languages and achieve better quality than all bilingual models.

PDF Abstract NeurIPS 2019 PDF NeurIPS 2019 Abstract

Code

Add Remove Mark official

tensorflow/lingvo

↳ Quickstart in

Colab

2,777

qubvel/efficientnet

2,061

KakaoBrain/torchgpipe

780

pytorch/tau

626

pytorch/pippy

626

See all 13 implementations

Tasks

Add Remove

Fine-Grained Image Classification

Image Classification

Machine Translation

Translation

Datasets

CIFAR-10

ImageNet

CIFAR-100

Stanford Cars

Birdsnap

Results from the Paper

Edit

Ranked #4 on Fine-Grained Image Classification on Birdsnap (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Fine-Grained Image Classification	Birdsnap	GPIPE	Accuracy	83.6%	# 4	Compare
Image Classification	CIFAR-10	GPIPE + transfer learning	Percentage correct	99	# 19	Compare
Image Classification	CIFAR-100	GPIPE	Percentage correct	91.3	# 21	Compare
Image Classification	ImageNet	GPIPE	Top 1 Accuracy	84.4%	# 299	Compare
Fine-Grained Image Classification	Stanford Cars	GPipe	Accuracy	94.6%	# 33	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • AmoebaNet • Average Pooling • BPE • Convolution • Dense Connections • Dropout • GPipe • Label Smoothing • Layer Normalization • Linear Layer • Max Pooling • Multi-Head Attention • Position-Wise Feed-Forward Layer • ReLU • Residual Connection • Scaled Dot-Product Attention • Softmax • Spatially Separable Convolution • Transformer

Edit Social Preview

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove