TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Chinese Sentence Pair Classification	BQ	RoBERTa-wwm-ext-large	F1	85.8	# 1
Chinese Sentence Pair Classification	BQ Dev	RoBERTa-wwm-ext-large	F1	86.3	# 1
Sentiment Analysis	ChnSentiCorp	RoBERTa-wwm-ext-large	F1	95.8	# 1
Sentiment Analysis	ChnSentiCorp Dev	RoBERTa-wwm-ext-large	F1	95.8	# 1
Chinese Reading Comprehension	CJRC	RoBERTa-wwm-ext-large	EM	62.4	# 1
Chinese Reading Comprehension	CJRC	RoBERTa-wwm-ext-large	F1	82.20	# 1
Chinese Reading Comprehension	CJRC Dev	RoBERTa-wwm-ext-large	EM	62.1	# 1
Chinese Reading Comprehension	CJRC Dev	RoBERTa-wwm-ext-large	F1	82.4	# 1
Chinese Reading Comprehension	CMRC 2018 (Chinese Machine Reading Comprehension 2018)	RoBERTa-wwm-ext (single model)	TEST-EM	72.600	# 21
Chinese Reading Comprehension	CMRC 2018 (Chinese Machine Reading Comprehension 2018)	RoBERTa-wwm-ext (single model)	TEST-F1	89.400	# 18
Chinese Reading Comprehension	CMRC 2018 (Chinese Machine Reading Comprehension 2018)	RoBERTa-wwm-ext (single model)	CHALLENGE-EM	26.200	# 15
Chinese Reading Comprehension	CMRC 2018 (Chinese Machine Reading Comprehension 2018)	RoBERTa-wwm-ext (single model)	CHALLENGE-F1	51.000	# 17
Chinese Reading Comprehension	CMRC 2018 (Chinese Machine Reading Comprehension 2018)	BERT-wwm-ext (single model)	TEST-EM	71.400	# 22
Chinese Reading Comprehension	CMRC 2018 (Chinese Machine Reading Comprehension 2018)	BERT-wwm-ext (single model)	TEST-F1	87.700	# 24
Chinese Reading Comprehension	CMRC 2018 (Chinese Machine Reading Comprehension 2018)	BERT-wwm-ext (single model)	CHALLENGE-EM	24.000	# 20
Chinese Reading Comprehension	CMRC 2018 (Chinese Machine Reading Comprehension 2018)	BERT-wwm-ext (single model)	CHALLENGE-F1	47.300	# 21
Chinese Reading Comprehension	CMRC 2018 (Chinese Machine Reading Comprehension 2018)	RoBERTa-wwm-ext-large (single model)	TEST-EM	74.198	# 12
Chinese Reading Comprehension	CMRC 2018 (Chinese Machine Reading Comprehension 2018)	RoBERTa-wwm-ext-large (single model)	TEST-F1	90.604	# 10
Chinese Reading Comprehension	CMRC 2018 (Chinese Machine Reading Comprehension 2018)	RoBERTa-wwm-ext-large (single model)	CHALLENGE-EM	31.548	# 5
Chinese Reading Comprehension	CMRC 2018 (Chinese Machine Reading Comprehension 2018)	RoBERTa-wwm-ext-large (single model)	CHALLENGE-F1	60.074	# 5
Chinese Reading Comprehension	CMRC 2018 (Chinese Machine Reading Comprehension 2018)	BERT-wwm (single model)	TEST-EM	70.500	# 24
Chinese Reading Comprehension	CMRC 2018 (Chinese Machine Reading Comprehension 2018)	BERT-wwm (single model)	TEST-F1	87.400	# 25
Chinese Reading Comprehension	CMRC 2018 (Chinese Machine Reading Comprehension 2018)	BERT-wwm (single model)	CHALLENGE-EM	21.000	# 25
Chinese Reading Comprehension	CMRC 2018 (Chinese Machine Reading Comprehension 2018)	BERT-wwm (single model)	CHALLENGE-F1	47.000	# 22
Chinese Reading Comprehension	CMRC 2018 (Simplified Chinese)	RoBERTa-wwm-ext-large	EM	74.2	# 1
Chinese Reading Comprehension	CMRC 2018 (Simplified Chinese)	RoBERTa-wwm-ext-large	F1	90.6	# 1
Chinese Reading Comprehension	CMRC 2018 (Simplified Chinese) Challenge	RoBERTa-wwm-ext-large	EM	31.5	# 1
Chinese Reading Comprehension	CMRC 2018 (Simplified Chinese) Challenge	RoBERTa-wwm-ext-large	F1	60.1	# 1
Chinese Reading Comprehension	CMRC 2018 (Simplified Chinese) Dev	RoBERTa-wwm-ext-large	EM	68.5	# 2
Chinese Reading Comprehension	CMRC 2018 (Simplified Chinese) Dev	RoBERTa-wwm-ext-large	F1	88.4	# 1
Chinese Reading Comprehension	DRCD (Traditional Chinese)	RoBERTa-wwm-ext-large	EM	89.6	# 1
Chinese Reading Comprehension	DRCD (Traditional Chinese)	RoBERTa-wwm-ext-large	F1	94.5	# 1
Chinese Reading Comprehension	DRCD (Traditional Chinese) Dev	RoBERTa-wwm-ext-large	EM	89.6	# 2
Chinese Reading Comprehension	DRCD (Traditional Chinese) Dev	RoBERTa-wwm-ext-large	F1	94.8	# 1
Chinese Sentence Pair Classification	LCQMC	RoBERTa-wwm-ext-large	F1	87	# 3
Chinese Sentence Pair Classification	LCQMC Dev	RoBERTa-wwm-ext-large	F1	90.4	# 1
Chinese Document Classification	THUCNews	RoBERTa-wwm-ext-large	F1	97.8	# 1
Chinese Document Classification	THUCNews Dev	RoBERTa-wwm-ext-large	F1	98.3	# 1
Chinese Sentence Pair Classification	XNLI	RoBERTa-wwm-ext-large	F1	81.2	# 1
Chinese Sentence Pair Classification	XNLI Dev	RoBERTa-wwm-ext-large	F1	82.1	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pre-training-with-whole-word-masking-for/chinese-sentence-pair-classification-on-bq)](https://paperswithcode.com/sota/chinese-sentence-pair-classification-on-bq?p=pre-training-with-whole-word-masking-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pre-training-with-whole-word-masking-for/chinese-sentence-pair-classification-on-bq-1)](https://paperswithcode.com/sota/chinese-sentence-pair-classification-on-bq-1?p=pre-training-with-whole-word-masking-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pre-training-with-whole-word-masking-for/sentiment-analysis-on-chnsenticorp)](https://paperswithcode.com/sota/sentiment-analysis-on-chnsenticorp?p=pre-training-with-whole-word-masking-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pre-training-with-whole-word-masking-for/sentiment-analysis-on-chnsenticorp-dev)](https://paperswithcode.com/sota/sentiment-analysis-on-chnsenticorp-dev?p=pre-training-with-whole-word-masking-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pre-training-with-whole-word-masking-for/chinese-reading-comprehension-on-cjrc)](https://paperswithcode.com/sota/chinese-reading-comprehension-on-cjrc?p=pre-training-with-whole-word-masking-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pre-training-with-whole-word-masking-for/chinese-reading-comprehension-on-cjrc-dev)](https://paperswithcode.com/sota/chinese-reading-comprehension-on-cjrc-dev?p=pre-training-with-whole-word-masking-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pre-training-with-whole-word-masking-for/chinese-reading-comprehension-on-cmrc-2018-1)](https://paperswithcode.com/sota/chinese-reading-comprehension-on-cmrc-2018-1?p=pre-training-with-whole-word-masking-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pre-training-with-whole-word-masking-for/chinese-reading-comprehension-on-cmrc-2018-2)](https://paperswithcode.com/sota/chinese-reading-comprehension-on-cmrc-2018-2?p=pre-training-with-whole-word-masking-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pre-training-with-whole-word-masking-for/chinese-reading-comprehension-on-drcd-1)](https://paperswithcode.com/sota/chinese-reading-comprehension-on-drcd-1?p=pre-training-with-whole-word-masking-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pre-training-with-whole-word-masking-for/chinese-sentence-pair-classification-on-lcqmc-1)](https://paperswithcode.com/sota/chinese-sentence-pair-classification-on-lcqmc-1?p=pre-training-with-whole-word-masking-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pre-training-with-whole-word-masking-for/chinese-document-classification-on-thucnews-1)](https://paperswithcode.com/sota/chinese-document-classification-on-thucnews-1?p=pre-training-with-whole-word-masking-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pre-training-with-whole-word-masking-for/chinese-document-classification-on-thucnews)](https://paperswithcode.com/sota/chinese-document-classification-on-thucnews?p=pre-training-with-whole-word-masking-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pre-training-with-whole-word-masking-for/chinese-sentence-pair-classification-on-xnli)](https://paperswithcode.com/sota/chinese-sentence-pair-classification-on-xnli?p=pre-training-with-whole-word-masking-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pre-training-with-whole-word-masking-for/chinese-sentence-pair-classification-on-xnli-1)](https://paperswithcode.com/sota/chinese-sentence-pair-classification-on-xnli-1?p=pre-training-with-whole-word-masking-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pre-training-with-whole-word-masking-for/chinese-reading-comprehension-on-cmrc-2018)](https://paperswithcode.com/sota/chinese-reading-comprehension-on-cmrc-2018?p=pre-training-with-whole-word-masking-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pre-training-with-whole-word-masking-for/chinese-reading-comprehension-on-drcd)](https://paperswithcode.com/sota/chinese-reading-comprehension-on-drcd?p=pre-training-with-whole-word-masking-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pre-training-with-whole-word-masking-for/chinese-sentence-pair-classification-on-lcqmc)](https://paperswithcode.com/sota/chinese-sentence-pair-classification-on-lcqmc?p=pre-training-with-whole-word-masking-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pre-training-with-whole-word-masking-for/chinese-reading-comprehension-on-cmrc-2018-3)](https://paperswithcode.com/sota/chinese-reading-comprehension-on-cmrc-2018-3?p=pre-training-with-whole-word-masking-for)`

Pre-Training with Whole Word Masking for Chinese BERT

19 Jun 2019 · Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang ·

Bidirectional Encoder Representations from Transformers (BERT) has shown marvelous improvements across various NLP tasks, and its consecutive variants have been proposed to further improve the performance of the pre-trained language models. In this paper, we aim to first introduce the whole word masking (wwm) strategy for Chinese BERT, along with a series of Chinese pre-trained language models. Then we also propose a simple but effective model called MacBERT, which improves upon RoBERTa in several ways. Especially, we propose a new masking strategy called MLM as correction (Mac). To demonstrate the effectiveness of these models, we create a series of Chinese pre-trained language models as our baselines, including BERT, RoBERTa, ELECTRA, RBT, etc. We carried out extensive experiments on ten Chinese NLP tasks to evaluate the created Chinese pre-trained language models as well as the proposed MacBERT. Experimental results show that MacBERT could achieve state-of-the-art performances on many NLP tasks, and we also ablate details with several findings that may help future research. We open-source our pre-trained language models for further facilitating our research community. Resources are available: https://github.com/ymcui/Chinese-BERT-wwm

PDF Abstract

Code

Add Remove Mark official

ymcui/Chinese-BERT-wwm official

9,300

brightmart/roberta_zh

2,530

Tasks

Add Remove

Document Classification

General Classification

Language Modelling

Machine Reading Comprehension

Named Entity Recognition (NER)

Natural Language Inference

Reading Comprehension

Sentiment Analysis

Datasets

SQuAD

XNLI

CoQA

CMRC

DRCD CMRC 2018

OCNLI CJRC LCQMC THUCNews

Results from the Paper

Edit

Ranked #1 on Chinese Reading Comprehension on CMRC 2018 (Simplified Chinese) Challenge

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Chinese Sentence Pair Classification	BQ	RoBERTa-wwm-ext-large	F1	85.8	# 1	Compare
Chinese Sentence Pair Classification	BQ Dev	RoBERTa-wwm-ext-large	F1	86.3	# 1	Compare
Sentiment Analysis	ChnSentiCorp	RoBERTa-wwm-ext-large	F1	95.8	# 1	Compare
Sentiment Analysis	ChnSentiCorp Dev	RoBERTa-wwm-ext-large	F1	95.8	# 1	Compare
Chinese Reading Comprehension	CJRC	RoBERTa-wwm-ext-large	EM	62.4	# 1	Compare
Chinese Reading Comprehension	CJRC	RoBERTa-wwm-ext-large	F1	82.20	# 1	Compare
Chinese Reading Comprehension	CJRC Dev	RoBERTa-wwm-ext-large	EM	62.1	# 1	Compare
Chinese Reading Comprehension	CJRC Dev	RoBERTa-wwm-ext-large	F1	82.4	# 1	Compare
Chinese Reading Comprehension	CMRC 2018 (Chinese Machine Reading Comprehension 2018)	RoBERTa-wwm-ext (single model)	TEST-EM	72.600	# 21	Compare
			TEST-F1	89.400	# 18	Compare
			CHALLENGE-EM	26.200	# 15	Compare
			CHALLENGE-F1	51.000	# 17	Compare
Chinese Reading Comprehension	CMRC 2018 (Chinese Machine Reading Comprehension 2018)	BERT-wwm-ext (single model)	TEST-EM	71.400	# 22	Compare
			TEST-F1	87.700	# 24	Compare
			CHALLENGE-EM	24.000	# 20	Compare
			CHALLENGE-F1	47.300	# 21	Compare
Chinese Reading Comprehension	CMRC 2018 (Chinese Machine Reading Comprehension 2018)	RoBERTa-wwm-ext-large (single model)	TEST-EM	74.198	# 12	Compare
			TEST-F1	90.604	# 10	Compare
			CHALLENGE-EM	31.548	# 5	Compare
			CHALLENGE-F1	60.074	# 5	Compare
Chinese Reading Comprehension	CMRC 2018 (Chinese Machine Reading Comprehension 2018)	BERT-wwm (single model)	TEST-EM	70.500	# 24	Compare
			TEST-F1	87.400	# 25	Compare
			CHALLENGE-EM	21.000	# 25	Compare
			CHALLENGE-F1	47.000	# 22	Compare
Chinese Reading Comprehension	CMRC 2018 (Simplified Chinese)	RoBERTa-wwm-ext-large	EM	74.2	# 1	Compare
Chinese Reading Comprehension	CMRC 2018 (Simplified Chinese)	RoBERTa-wwm-ext-large	F1	90.6	# 1	Compare
Chinese Reading Comprehension	CMRC 2018 (Simplified Chinese) Challenge	RoBERTa-wwm-ext-large	EM	31.5	# 1	Compare
Chinese Reading Comprehension	CMRC 2018 (Simplified Chinese) Challenge	RoBERTa-wwm-ext-large	F1	60.1	# 1	Compare
Chinese Reading Comprehension	CMRC 2018 (Simplified Chinese) Dev	RoBERTa-wwm-ext-large	EM	68.5	# 2	Compare
Chinese Reading Comprehension	CMRC 2018 (Simplified Chinese) Dev	RoBERTa-wwm-ext-large	F1	88.4	# 1	Compare
Chinese Reading Comprehension	DRCD (Traditional Chinese)	RoBERTa-wwm-ext-large	EM	89.6	# 1	Compare
Chinese Reading Comprehension	DRCD (Traditional Chinese)	RoBERTa-wwm-ext-large	F1	94.5	# 1	Compare
Chinese Reading Comprehension	DRCD (Traditional Chinese) Dev	RoBERTa-wwm-ext-large	EM	89.6	# 2	Compare
Chinese Reading Comprehension	DRCD (Traditional Chinese) Dev	RoBERTa-wwm-ext-large	F1	94.8	# 1	Compare
Chinese Sentence Pair Classification	LCQMC	RoBERTa-wwm-ext-large	F1	87	# 3	Compare
Chinese Sentence Pair Classification	LCQMC Dev	RoBERTa-wwm-ext-large	F1	90.4	# 1	Compare
Chinese Document Classification	THUCNews	RoBERTa-wwm-ext-large	F1	97.8	# 1	Compare
Chinese Document Classification	THUCNews Dev	RoBERTa-wwm-ext-large	F1	98.3	# 1	Compare
Chinese Sentence Pair Classification	XNLI	RoBERTa-wwm-ext-large	F1	81.2	# 1	Compare
Chinese Sentence Pair Classification	XNLI Dev	RoBERTa-wwm-ext-large	F1	82.1	# 1	Compare

Methods

Add Remove

Adam • Attention Dropout • BERT • Dense Connections • Dropout • GELU • Layer Normalization • Linear Layer • Linear Warmup With Linear Decay • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Weight Decay • WordPiece

Edit Social Preview

Pre-Training with Whole Word Masking for Chinese BERT

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove