Natural Language Processing

Parallel Corpus Mining

8 papers with code • 0 benchmarks • 1 datasets

Mining a corpus of bilingual sentence pairs that are translations of each other.

Benchmarks

Add a Result

These leaderboards are used to track progress in Parallel Corpus Mining

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Libraries

Use these libraries to find Parallel Corpus Mining models and implementations

facebookresearch/LASER

2 papers

3,520

Datasets

ASLG-PC12

Latest papers with no code

Most implemented Social Latest No code

Better Quality Estimation for Low Resource Corpus Mining

no code yet • Findings (ACL) 2022

We show that State-of-the-art QE models, when tested in a Parallel Corpus Mining (PCM) setting, perform unexpectedly bad due to a lack of robustness to out-of-domain examples.

Paper
Add Code

Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining

no code yet • ACL 2020

Existing models of multilingual sentence embeddings require large parallel data resources which are not available for low-resource languages.

Paper
Add Code

Unsupervised Parallel Corpus Mining on Web Data

no code yet • 18 Sep 2020

In contrast, there is a large-scale of parallel corpus created by humans on the Internet.

Paper
Add Code

Hierarchical Document Encoder for Parallel Corpus Mining

no code yet • WS 2019

We explore using multilingual document embeddings for nearest neighbor mining of parallel data.

Paper
Add Code

Effective Parallel Corpus Mining using Bilingual Sentence Embeddings

no code yet • WS 2018

This paper presents an effective approach for parallel corpus mining using bilingual sentence embeddings.

Paper
Add Code

Parallel Corpus Mining

Benchmarks Add a Result

Libraries

Datasets

Latest papers with no code

Better Quality Estimation for Low Resource Corpus Mining

Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining

Unsupervised Parallel Corpus Mining on Web Data

Hierarchical Document Encoder for Parallel Corpus Mining

Effective Parallel Corpus Mining using Bilingual Sentence Embeddings

Content

Benchmarks

Add a Result