XLNet: Generalized Autoregressive Pretraining for Language Understanding

19 Jun 2019Zhilin YangZihang DaiYiming YangJaime CarbonellRuslan SalakhutdinovQuoc V. Le

With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy... (read more)

PDF Abstract

Evaluation results from the paper


Task Dataset Model Metric name Metric value Global rank Compare
Text Classification AG News XLNet Error 4.49 # 1
Text Classification Amazon-2 XLNet Error 2.40 # 1
Text Classification Amazon-5 XLNet Error 32.26 # 1
Document Ranking ClueWeb09-B XLNet [email protected] 31.10 # 1
Document Ranking ClueWeb09-B XLNet [email protected] 20.28 # 1
Text Classification DBpedia XLNet Error 0.62 # 1
Text Classification IMDb XLNet Accuracy 96.21 # 1
Semantic Textual Similarity MRPC XLNet Accuracy 93.0% # 1
Natural Language Inference MultiNLI XLNet Matched 90.2 # 1
Natural Language Inference MultiNLI XLNet Mismatched 89.7 # 1
Natural Language Inference QNLI XLNet Accuracy 98.6% # 1
Natural Language Inference Quora Question Pairs XLNet Accuracy 90.3 # 1
Question Answering Quora Question Pairs XLNet Accuracy 90.3% # 1
Reading Comprehension RACE XLNet Accuracy 81.75 # 1
Natural Language Inference RTE XLNet Accuracy 86.3% # 1
Question Answering SQuAD1.1 XLNet EM 89.90 # 1
Question Answering SQuAD1.1 XLNet F1 95.08 # 1
Question Answering SQuAD2.0 XLNet EM 86.35 # 4
Question Answering SQuAD2.0 XLNet F1 89.13 # 5
Sentiment Analysis SST-2 Binary classification XLNet Accuracy 96.8 # 1
Text Classification Yelp-2 XLNet Accuracy 98.45% # 1
Text Classification Yelp-5 XLNet Accuracy 73.20% # 1