Parallel Corpus Mining
8 papers with code • 0 benchmarks • 1 datasets
Mining a corpus of bilingual sentence pairs that are translations of each other.
Benchmarks
These leaderboards are used to track progress in Parallel Corpus Mining
Libraries
Use these libraries to find Parallel Corpus Mining models and implementationsLatest papers with no code
Better Quality Estimation for Low Resource Corpus Mining
We show that State-of-the-art QE models, when tested in a Parallel Corpus Mining (PCM) setting, perform unexpectedly bad due to a lack of robustness to out-of-domain examples.
Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining
Existing models of multilingual sentence embeddings require large parallel data resources which are not available for low-resource languages.
Unsupervised Parallel Corpus Mining on Web Data
In contrast, there is a large-scale of parallel corpus created by humans on the Internet.
Hierarchical Document Encoder for Parallel Corpus Mining
We explore using multilingual document embeddings for nearest neighbor mining of parallel data.
Effective Parallel Corpus Mining using Bilingual Sentence Embeddings
This paper presents an effective approach for parallel corpus mining using bilingual sentence embeddings.