PolyNews is a multilingual parallel dataset containing news titles 833 language pairs, spanning in 64 languages and 17 scripts.
PolyNewsParallel aims to provide an easily-accessible, unified and de-duplicated dataset that combines three disparate data sources. It can be used for machine translation or text retrieval in both high-resource and low-resource languages.
Paper | Code | Results | Date | Stars |
---|