CzEng 2.0 Parallel Corpus

Introduced by Kocmi et al. in Announcing CzEng 2.0 Parallel Corpus with over 2 Gigawords

Czech-English parallel corpus CzEng 2.0 consisting of over 2 billion words (2 "gigawords") in each language. The corpus contains document-level information and is filtered with several techniques to lower the amount of noise.

Source: Announcing CzEng 2.0 Parallel Corpus with over 2 Gigawords

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages