TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Common Sense Reasoning (One-Shot)	C3	PanGu-α 2.6B	Accuracy	52.82	# 1
Common Sense Reasoning (Zero-Shot)	C3	PanGu-α 2.6B	Accuracy	53.42	# 1
Common Sense Reasoning (Few-Shot)	C3	PanGu-α 2.6B	Accuracy	53.64	# 1
Cloze (multi-choices) (Few-Shot)	ChID	PanGu-α 2.6B	Accuracy	66.56	# 1
Cloze (multi-choices) (Zero-Shot)	ChID	PanGu-α 2.6B	Accuracy	68.73	# 1
Cloze (multi-choices) (One-Shot)	ChID	PanGu-α 2.6B	Accuracy	68.16	# 1
Cloze (multi-choices) (One-Shot)	CMRC 2017	PanGu-α 2.6B	Accuracy	38.0	# 1
Cloze (multi-choices) (Zero-Shot)	CMRC 2017	PanGu-α 2.6B	Accuracy	37.83	# 1
Cloze (multi-choices) (Few-Shot)	CMRC 2017	PanGu-α 2.6B	Accuracy	36.33	# 1
Reading Comprehension (One-Shot)	CMRC 2018	PanGu-α 2.6B	F1	18.57	# 1
Reading Comprehension (One-Shot)	CMRC 2018	PanGu-α 2.6B	EM	2.49	# 1
Reading Comprehension (Few-Shot)	CMRC 2018	PanGu-α 2.6B	F1	23.22	# 1
Reading Comprehension (Few-Shot)	CMRC 2018	PanGu-α 2.6B	EM	5.68	# 1
Reading Comprehension (Zero-Shot)	CMRC 2018	PanGu-α 2.6B	F1	16.647	# 1
Reading Comprehension (Zero-Shot)	CMRC 2018	PanGu-α 2.6B	EM	1.21	# 1
Cloze (multi-choices) (Few-Shot)	CMRC 2019	PanGu-α 2.6B	Accuracy	62.42	# 1
Cloze (multi-choices) (One-Shot)	CMRC 2019	PanGu-α 2.6B	Accuracy	61.54	# 1
Cloze (multi-choices) (Zero-Shot)	CMRC 2019	PanGu-α 2.6B	Accuracy	61.93	# 1
Reading Comprehension (Few-Shot)	DRCD	PanGu-α 2.6B	EM	5.31	# 1
Reading Comprehension (Few-Shot)	DRCD	PanGu-α 2.6B	F1	18.29	# 1
Reading Comprehension (One-Shot)	DRCD	PanGu-α 2.6B	EM	2.47	# 1
Reading Comprehension (One-Shot)	DRCD	PanGu-α 2.6B	F1	12.58	# 1
Reading Comprehension (Zero-Shot)	DRCD	PanGu-α 2.6B	EM	0.8	# 1
Reading Comprehension (Zero-Shot)	DRCD	PanGu-α 2.6B	F1	9.99	# 1
Reading Comprehension (One-Shot)	DuReader	PanGu-α 2.6B	ROUGE-1	20.18	# 1
Reading Comprehension (Zero-Shot)	DuReader	PanGu-α 2.6B	ROUGE-1	21.07	# 1
Reading Comprehension (Few-Shot)	DuReader	PanGu-α 2.6B	ROUGE-1	21.43	# 1
Natural Language Inference (One-Shot)	OCNLI	PanGu-α 2.6B	Accuracy	44.0	# 1
Natural Language Inference (Few-Shot)	OCNLI	PanGu-α 2.6B	Accuracy	46.78	# 1
Natural Language Inference (Zero-Shot)	OCNLI	PanGu-α 2.6B	Accuracy	42.61	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pangu-a-large-scale-autoregressive-pretrained/common-sense-reasoning-one-shot-on-c3)](https://paperswithcode.com/sota/common-sense-reasoning-one-shot-on-c3?p=pangu-a-large-scale-autoregressive-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pangu-a-large-scale-autoregressive-pretrained/common-sense-reasoning-zero-shot-on-c3)](https://paperswithcode.com/sota/common-sense-reasoning-zero-shot-on-c3?p=pangu-a-large-scale-autoregressive-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pangu-a-large-scale-autoregressive-pretrained/common-sense-reasoning-few-shot-on-c3)](https://paperswithcode.com/sota/common-sense-reasoning-few-shot-on-c3?p=pangu-a-large-scale-autoregressive-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pangu-a-large-scale-autoregressive-pretrained/cloze-multi-choices-few-shot-on-chid)](https://paperswithcode.com/sota/cloze-multi-choices-few-shot-on-chid?p=pangu-a-large-scale-autoregressive-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pangu-a-large-scale-autoregressive-pretrained/cloze-multi-choices-zero-shot-on-chid)](https://paperswithcode.com/sota/cloze-multi-choices-zero-shot-on-chid?p=pangu-a-large-scale-autoregressive-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pangu-a-large-scale-autoregressive-pretrained/cloze-multi-choices-one-shot-on-chid)](https://paperswithcode.com/sota/cloze-multi-choices-one-shot-on-chid?p=pangu-a-large-scale-autoregressive-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pangu-a-large-scale-autoregressive-pretrained/cloze-multi-choices-one-shot-on-cmrc-2017)](https://paperswithcode.com/sota/cloze-multi-choices-one-shot-on-cmrc-2017?p=pangu-a-large-scale-autoregressive-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pangu-a-large-scale-autoregressive-pretrained/cloze-multi-choices-zero-shot-on-cmrc-2017)](https://paperswithcode.com/sota/cloze-multi-choices-zero-shot-on-cmrc-2017?p=pangu-a-large-scale-autoregressive-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pangu-a-large-scale-autoregressive-pretrained/cloze-multi-choices-few-shot-on-cmrc-2017)](https://paperswithcode.com/sota/cloze-multi-choices-few-shot-on-cmrc-2017?p=pangu-a-large-scale-autoregressive-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pangu-a-large-scale-autoregressive-pretrained/reading-comprehension-one-shot-on-cmrc-2018)](https://paperswithcode.com/sota/reading-comprehension-one-shot-on-cmrc-2018?p=pangu-a-large-scale-autoregressive-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pangu-a-large-scale-autoregressive-pretrained/reading-comprehension-few-shot-on-cmrc-2018)](https://paperswithcode.com/sota/reading-comprehension-few-shot-on-cmrc-2018?p=pangu-a-large-scale-autoregressive-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pangu-a-large-scale-autoregressive-pretrained/reading-comprehension-zero-shot-on-cmrc-2018)](https://paperswithcode.com/sota/reading-comprehension-zero-shot-on-cmrc-2018?p=pangu-a-large-scale-autoregressive-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pangu-a-large-scale-autoregressive-pretrained/cloze-multi-choices-few-shot-on-cmrc-2019)](https://paperswithcode.com/sota/cloze-multi-choices-few-shot-on-cmrc-2019?p=pangu-a-large-scale-autoregressive-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pangu-a-large-scale-autoregressive-pretrained/cloze-multi-choices-one-shot-on-cmrc-2019)](https://paperswithcode.com/sota/cloze-multi-choices-one-shot-on-cmrc-2019?p=pangu-a-large-scale-autoregressive-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pangu-a-large-scale-autoregressive-pretrained/cloze-multi-choices-zero-shot-on-cmrc-2019)](https://paperswithcode.com/sota/cloze-multi-choices-zero-shot-on-cmrc-2019?p=pangu-a-large-scale-autoregressive-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pangu-a-large-scale-autoregressive-pretrained/reading-comprehension-few-shot-on-drcd)](https://paperswithcode.com/sota/reading-comprehension-few-shot-on-drcd?p=pangu-a-large-scale-autoregressive-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pangu-a-large-scale-autoregressive-pretrained/reading-comprehension-one-shot-on-drcd)](https://paperswithcode.com/sota/reading-comprehension-one-shot-on-drcd?p=pangu-a-large-scale-autoregressive-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pangu-a-large-scale-autoregressive-pretrained/reading-comprehension-zero-shot-on-drcd)](https://paperswithcode.com/sota/reading-comprehension-zero-shot-on-drcd?p=pangu-a-large-scale-autoregressive-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pangu-a-large-scale-autoregressive-pretrained/reading-comprehension-one-shot-on-dureader)](https://paperswithcode.com/sota/reading-comprehension-one-shot-on-dureader?p=pangu-a-large-scale-autoregressive-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pangu-a-large-scale-autoregressive-pretrained/reading-comprehension-zero-shot-on-dureader)](https://paperswithcode.com/sota/reading-comprehension-zero-shot-on-dureader?p=pangu-a-large-scale-autoregressive-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pangu-a-large-scale-autoregressive-pretrained/reading-comprehension-few-shot-on-dureader)](https://paperswithcode.com/sota/reading-comprehension-few-shot-on-dureader?p=pangu-a-large-scale-autoregressive-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pangu-a-large-scale-autoregressive-pretrained/natural-language-inference-one-shot-on-ocnli)](https://paperswithcode.com/sota/natural-language-inference-one-shot-on-ocnli?p=pangu-a-large-scale-autoregressive-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pangu-a-large-scale-autoregressive-pretrained/natural-language-inference-few-shot-on-ocnli)](https://paperswithcode.com/sota/natural-language-inference-few-shot-on-ocnli?p=pangu-a-large-scale-autoregressive-pretrained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pangu-a-large-scale-autoregressive-pretrained/natural-language-inference-zero-shot-on-ocnli)](https://paperswithcode.com/sota/natural-language-inference-zero-shot-on-ocnli?p=pangu-a-large-scale-autoregressive-pretrained)`

PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation

26 Apr 2021 · Wei Zeng, Xiaozhe Ren, Teng Su, Hui Wang, Yi Liao, Zhiwei Wang, Xin Jiang, ZhenZhang Yang, Kaisheng Wang, Xiaoda Zhang, Chen Li, Ziyan Gong, Yifan Yao, Xinjing Huang, Jun Wang, Jianfeng Yu, Qi Guo, Yue Yu, Yan Zhang, Jin Wang, Hengtao Tao, Dasen Yan, Zexuan Yi, Fang Peng, Fangqing Jiang, Han Zhang, Lingfeng Deng, Yehong Zhang, Zhe Lin, Chao Zhang, Shaojie Zhang, Mingyue Guo, Shanzhi Gu, Gaojun Fan, YaoWei Wang, Xuefeng Jin, Qun Liu, Yonghong Tian ·

Large-scale Pretrained Language Models (PLMs) have become the new paradigm for Natural Language Processing (NLP). PLMs with hundreds of billions parameters such as GPT-3 have demonstrated strong performances on natural language understanding and generation with \textit{few-shot in-context} learning. In this work, we present our practice on training large-scale autoregressive language models named PanGu-$\alpha$, with up to 200 billion parameters. PanGu-$\alpha$ is developed under the MindSpore and trained on a cluster of 2048 Ascend 910 AI processors. The training parallelism strategy is implemented based on MindSpore Auto-parallel, which composes five parallelism dimensions to scale the training task to 2048 processors efficiently, including data parallelism, op-level model parallelism, pipeline model parallelism, optimizer model parallelism and rematerialization. To enhance the generalization ability of PanGu-$\alpha$, we collect 1.1TB high-quality Chinese data from a wide range of domains to pretrain the model. We empirically test the generation ability of PanGu-$\alpha$ in various scenarios including text summarization, question answering, dialogue generation, etc. Moreover, we investigate the effect of model scales on the few-shot performances across a broad range of Chinese NLP tasks. The experimental results demonstrate the superior capabilities of PanGu-$\alpha$ in performing various tasks under few-shot or zero-shot settings.

PDF Abstract

Code

Add Remove Mark official

mindspore-ai/models

219

2023-MindSpore-1/ms-code-14

2023-MindSpore-1/ms-code-162

JoegameZhou/mPanGu-Alpha-53

Tasks

Add Remove

Cloze (multi-choices) (Few-Shot)

Cloze (multi-choices) (One-Shot)

Cloze (multi-choices) (Zero-Shot)

Common Sense Reasoning (Few-Shot)

Common Sense Reasoning (One-Shot)

Common Sense Reasoning (Zero-Shot)

Dialogue Generation

Few-Shot Image Classification

In-Context Learning

Natural Language Inference

Natural Language Inference (Few-Shot)

Natural Language Inference (One-Shot)

Natural Language Inference (Zero-Shot)

Natural Language Understanding

Question Answering

Reading Comprehension

Reading Comprehension (Few-Shot)

Reading Comprehension (One-Shot)

Reading Comprehension (Zero-Shot)

Text Classification

Text Summarization

Datasets

CMRC

DuReader

DRCD CMRC 2018

OCNLI C3

ChID CMNLI CMRC 2017 CMRC 2019

Results from the Paper

Edit

Ranked #1 on Reading Comprehension (One-Shot) on DuReader

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Common Sense Reasoning (One-Shot)	C3	PanGu-α 2.6B	Accuracy	52.82	# 1	Compare
Common Sense Reasoning (Zero-Shot)	C3	PanGu-α 2.6B	Accuracy	53.42	# 1	Compare
Common Sense Reasoning (Few-Shot)	C3	PanGu-α 2.6B	Accuracy	53.64	# 1	Compare
Cloze (multi-choices) (Few-Shot)	ChID	PanGu-α 2.6B	Accuracy	66.56	# 1	Compare
Cloze (multi-choices) (Zero-Shot)	ChID	PanGu-α 2.6B	Accuracy	68.73	# 1	Compare
Cloze (multi-choices) (One-Shot)	ChID	PanGu-α 2.6B	Accuracy	68.16	# 1	Compare
Cloze (multi-choices) (One-Shot)	CMRC 2017	PanGu-α 2.6B	Accuracy	38.0	# 1	Compare
Cloze (multi-choices) (Zero-Shot)	CMRC 2017	PanGu-α 2.6B	Accuracy	37.83	# 1	Compare
Cloze (multi-choices) (Few-Shot)	CMRC 2017	PanGu-α 2.6B	Accuracy	36.33	# 1	Compare
Reading Comprehension (One-Shot)	CMRC 2018	PanGu-α 2.6B	F1	18.57	# 1	Compare
Reading Comprehension (One-Shot)	CMRC 2018	PanGu-α 2.6B	EM	2.49	# 1	Compare
Reading Comprehension (Few-Shot)	CMRC 2018	PanGu-α 2.6B	F1	23.22	# 1	Compare
Reading Comprehension (Few-Shot)	CMRC 2018	PanGu-α 2.6B	EM	5.68	# 1	Compare
Reading Comprehension (Zero-Shot)	CMRC 2018	PanGu-α 2.6B	F1	16.647	# 1	Compare
Reading Comprehension (Zero-Shot)	CMRC 2018	PanGu-α 2.6B	EM	1.21	# 1	Compare
Cloze (multi-choices) (Few-Shot)	CMRC 2019	PanGu-α 2.6B	Accuracy	62.42	# 1	Compare
Cloze (multi-choices) (One-Shot)	CMRC 2019	PanGu-α 2.6B	Accuracy	61.54	# 1	Compare
Cloze (multi-choices) (Zero-Shot)	CMRC 2019	PanGu-α 2.6B	Accuracy	61.93	# 1	Compare
Reading Comprehension (Few-Shot)	DRCD	PanGu-α 2.6B	EM	5.31	# 1	Compare
Reading Comprehension (Few-Shot)	DRCD	PanGu-α 2.6B	F1	18.29	# 1	Compare
Reading Comprehension (One-Shot)	DRCD	PanGu-α 2.6B	EM	2.47	# 1	Compare
Reading Comprehension (One-Shot)	DRCD	PanGu-α 2.6B	F1	12.58	# 1	Compare
Reading Comprehension (Zero-Shot)	DRCD	PanGu-α 2.6B	EM	0.8	# 1	Compare
Reading Comprehension (Zero-Shot)	DRCD	PanGu-α 2.6B	F1	9.99	# 1	Compare
Reading Comprehension (One-Shot)	DuReader	PanGu-α 2.6B	ROUGE-1	20.18	# 1	Compare
Reading Comprehension (Zero-Shot)	DuReader	PanGu-α 2.6B	ROUGE-1	21.07	# 1	Compare
Reading Comprehension (Few-Shot)	DuReader	PanGu-α 2.6B	ROUGE-1	21.43	# 1	Compare
Natural Language Inference (One-Shot)	OCNLI	PanGu-α 2.6B	Accuracy	44.0	# 1	Compare
Natural Language Inference (Few-Shot)	OCNLI	PanGu-α 2.6B	Accuracy	46.78	# 1	Compare
Natural Language Inference (Zero-Shot)	OCNLI	PanGu-α 2.6B	Accuracy	42.61	# 1	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • Attention Dropout • BPE • Cosine Annealing • Dense Connections • Dropout • Fixed Factorized Attention • GELU • GPT-3 • Label Smoothing • Layer Normalization • Linear Layer • Linear Warmup With Cosine Annealing • Multi-Head Attention • PanGu-$α$ • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Strided Attention • Transformer • Weight Decay

Edit Social Preview

PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove