Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings

ACL 2019 Mikel ArtetxeHolger Schwenk

Machine translation is highly sensitive to the size and quality of the training data, which has led to an increasing interest in collecting and filtering large parallel corpora. In this paper, we propose a new method for this task based on multilingual sentence embeddings... (read more)

PDF Abstract ACL 2019 PDF ACL 2019 Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Cross-Lingual Bitext Mining BUCC French-to-English Multilingual Sentence Embeddings F1 score 92.89 # 2
Cross-Lingual Bitext Mining BUCC German-to-English Multilingual Sentence Embeddings F1 score 95.58 # 2

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet