Browse > Natural Language Processing > Cross-Lingual Bitext Mining

Cross-Lingual Bitext Mining

2 papers with code · Natural Language Processing

State-of-the-art leaderboards

Greatest papers with code

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond

26 Dec 2018facebookresearch/LASER

We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different language families and written in 28 different scripts. Finally, we introduce a new test set of aligned sentences in 122 languages based on the Tatoeba corpus, and show that our sentence embeddings obtain strong results in multilingual similarity search even for low-resource languages.

CROSS-LINGUAL BITEXT MINING CROSS-LINGUAL DOCUMENT CLASSIFICATION CROSS-LINGUAL NATURAL LANGUAGE INFERENCE CROSS-LINGUAL TRANSFER DOCUMENT CLASSIFICATION JOINT MULTILINGUAL SENTENCE REPRESENTATIONS PARALLEL CORPUS MINING

Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings

3 Nov 2018facebookresearch/LASER

In this paper, we propose a new method for this task based on multilingual sentence embeddings. Our approach uses an encoder-decoder trained over an initial parallel corpus to build multilingual sentence representations, which are then incorporated into a new margin-based method to score, mine and filter parallel sentences.

CROSS-LINGUAL BITEXT MINING MACHINE TRANSLATION PARALLEL CORPUS MINING SENTENCE EMBEDDINGS