About

Cross-lingual bitext mining is the task of mining sentence pairs that are translations of each other from large text corpora.

Benchmarks

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Greatest papers with code

Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings

ACL 2019 facebookresearch/LASER

Machine translation is highly sensitive to the size and quality of the training data, which has led to an increasing interest in collecting and filtering large parallel corpora.

CROSS-LINGUAL BITEXT MINING MACHINE TRANSLATION PARALLEL CORPUS MINING SENTENCE EMBEDDINGS

Improving Neural Machine Translation Models with Monolingual Data

ACL 2016 surafelml/Afro-NMT

Neural Machine Translation (NMT) has obtained state-of-the art performance for several language pairs, while only using parallel data for training.

CROSS-LINGUAL BITEXT MINING LANGUAGE MODELLING MACHINE TRANSLATION

Majority Voting with Bidirectional Pre-translation For Bitext Retrieval

10 Mar 2021AlexJonesNLP/alt-bitexts

Obtaining high-quality parallel corpora is of paramount importance for training NMT systems.

CROSS-LINGUAL BITEXT MINING