XLM

Introduced by Lample et al. in Cross-lingual Language Model Pretraining

XLM is a Transformer based architecture that is pre-trained using one of three language modelling objectives:

Causal Language Modeling - models the probability of a word given the previous words in a sentence.
Masked Language Modeling - the masked language modeling objective of BERT.
Translation Language Modeling - a (new) translation language modeling objective for improving cross-lingual pre-training.

The authors find that both the CLM and MLM approaches provide strong cross-lingual features that can be used for pretraining models.

Source: Cross-lingual Language Model Pretraining

Read Paper See Code

Paper	Code	Results	Date	Stars

This feature is experimental; we are continuously improving our matching algorithm.

Component	Type	Add Remove
Adam	Stochastic Optimization
Attention Dropout	Regularization
BPE	Subword Segmentation
Dense Connections	Feedforward Networks
Dropout	Regularization
GELU	Activation Functions
Layer Normalization	Normalization
Multi-Head Attention	Attention Modules
Residual Connection	Skip Connections
Scaled Dot-Product Attention	Attention Mechanisms
Softmax	Output Functions