Cross-lingual Language Model Pretraining

Recent studies have demonstrated the efficiency of generative pretraining for English natural language understanding. In this work, we extend this approach to multiple languages and show the effectiveness of cross-lingual pretraining... (read more)

PDF Abstract NeurIPS 2019 PDF NeurIPS 2019 Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Unsupervised Machine Translation WMT2014 English-French MLM pretraining for encoder and decoder BLEU 33.4 # 3
Unsupervised Machine Translation WMT2014 French-English MLM pretraining for encoder and decoder BLEU 33.3 # 4
Unsupervised Machine Translation WMT2016 English-German MLM pretraining for encoder and decoder BLEU 26.4 # 4
Unsupervised Machine Translation WMT2016 English-Romanian MLM pretraining for encoder and decoder BLEU 33.3 # 2
Unsupervised Machine Translation WMT2016 English--Romanian MLM pretraining for encoder and decoder BLEU 33.3 # 1
Unsupervised Machine Translation WMT2016 German-English MLM pretraining for encoder and decoder BLEU 34.3 # 4
Unsupervised Machine Translation WMT2016 Romanian-English MLM pretraining for encoder and decoder BLEU 31.8 # 3
Machine Translation WMT2016 Romanian-English MLM pretraining BLEU score 35.3 # 2
Natural Language Inference XNLI French XLM (MLM+TLM) Accuracy 80.2 # 2

Methods used in the Paper


METHOD TYPE
Multi-Head Attention
Attention Modules
Residual Connection
Skip Connections
Attention Dropout
Regularization
BPE
Subword Segmentation
GELU
Activation Functions
Dense Connections
Feedforward Networks
Adam
Stochastic Optimization
Softmax
Output Functions
Dropout
Regularization
Layer Normalization
Normalization
Scaled Dot-Product Attention
Attention Mechanisms
XLM
Transformers