Texts

BUCC (Building and Using Comparable Corpora)

Introduced by Zweigenbaum et al. in Overview of the Second BUCC Shared Task: Spotting Parallel Sentences in Comparable Corpora

The BUCC mining task is a shared task on parallel sentence extraction from two monolingual corpora with a subset of them assumed to be parallel, and that has been available since 2016. For each language pair, the shared task provides a monolingual corpus for each language and a gold mapping list containing true translation pairs. These pairs are the ground truth. The task is to construct a list of translation pairs from the monolingual corpora. The constructed list is compared to the ground truth, and evaluated in terms of the F1 measure.

Source: Language-agnostic BERT Sentence Embedding

Homepage

Benchmarks

Add a new result Link an existing benchmark

Task	Dataset Variant	Best Model
Cross-Lingual Bitext Mining	BUCC German-to-English	Massively Multilingual Sentence Embeddings
Cross-Lingual Bitext Mining	BUCC French-to-English	Massively Multilingual Sentence Embeddings
Cross-Lingual Bitext Mining	BUCC Russian-to-English	Massively Multilingual Sentence Embeddings
Cross-Lingual Bitext Mining	BUCC Chinese-to-English	Massively Multilingual Sentence Embeddings