Czech-English parallel corpus CzEng 2.0 consisting of over 2 billion words (2 "gigawords") in each language. The corpus contains document-level information and is filtered with several techniques to lower the amount of noise.
Source: Announcing CzEng 2.0 Parallel Corpus with over 2 GigawordsPaper | Code | Results | Date | Stars |
---|