VECO: Variable Encoder-decoder Pre-training for Cross-lingual Understanding and Generation

30 Oct 2020 Fuli Luo Wei Wang Jiahao Liu Yijia Liu Bin Bi Songfang Huang Fei Huang Luo Si

Recent studies about learning multilingual representations have achieved significant performance gains across a wide range of downstream cross-lingual tasks. They train either an encoder-only Transformer mainly for understanding tasks, or an encoder-decoder Transformer specifically for generation tasks, ignoring the correlation between the two tasks and frameworks... (read more)

PDF Abstract
No code implementations yet. Submit your code now

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Multi-Head Attention
Attention Modules
Scaled Dot-Product Attention
Attention Mechanisms
Layer Normalization
Normalization
Residual Connection
Skip Connections
Dropout
Regularization
Adam
Stochastic Optimization
BPE
Subword Segmentation
Softmax
Output Functions
Dense Connections
Feedforward Networks
Label Smoothing
Regularization
Transformer
Transformers