Parallel Corpus Mining

8 papers with code • 0 benchmarks • 1 datasets

Mining a corpus of bilingual sentence pairs that are translations of each other.

Libraries

Use these libraries to find Parallel Corpus Mining models and implementations

Datasets


Latest papers with no code

Better Quality Estimation for Low Resource Corpus Mining

no code yet • Findings (ACL) 2022

We show that State-of-the-art QE models, when tested in a Parallel Corpus Mining (PCM) setting, perform unexpectedly bad due to a lack of robustness to out-of-domain examples.

Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining

no code yet • ACL 2020

Existing models of multilingual sentence embeddings require large parallel data resources which are not available for low-resource languages.

Unsupervised Parallel Corpus Mining on Web Data

no code yet • 18 Sep 2020

In contrast, there is a large-scale of parallel corpus created by humans on the Internet.

Hierarchical Document Encoder for Parallel Corpus Mining

no code yet • WS 2019

We explore using multilingual document embeddings for nearest neighbor mining of parallel data.

Effective Parallel Corpus Mining using Bilingual Sentence Embeddings

no code yet • WS 2018

This paper presents an effective approach for parallel corpus mining using bilingual sentence embeddings.