Parallel Corpus Filtering via Pre-trained Language Models

Web-crawled data provides a good source of parallel corpora for training machine translation models. It is automatically obtained, but extremely noisy, and recent work shows that neural machine translation systems are more sensitive to noise than traditional statistical machine translation methods... (read more)

PDF Abstract ACL 2020 PDF ACL 2020 Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Machine Translation WMT2019 English-Japanese fiore BLEU 527424878 # 1

Methods used in the Paper


METHOD TYPE
Weight Decay
Regularization
Softmax
Output Functions
Adam
Stochastic Optimization
Multi-Head Attention
Attention Modules
Dropout
Regularization
GELU
Activation Functions
Attention Dropout
Regularization
Linear Warmup With Linear Decay
Learning Rate Schedules
Dense Connections
Feedforward Networks
Layer Normalization
Normalization
Scaled Dot-Product Attention
Attention Mechanisms
WordPiece
Subword Segmentation
Residual Connection
Skip Connections
BERT
Language Models