WikiAnn is a dataset for cross-lingual name tagging and linking based on Wikipedia articles in 295 languages.
54 PAPERS • 7 BENCHMARKS
The first parallel corpus composed from United Nations documents published by the original data creator. The parallel corpus presented consists of manually translated UN documents from the last 25 years (1990 to 2014) for the six official UN languages, Arabic, Chinese, English, French, Russian, and Spanish.
17 PAPERS • NO BENCHMARKS YET