WikiANN, also known as PAN-X, is a multilingual named entity recognition dataset. It consists of Wikipedia articles that have been annotated with LOC (location), PER (person), and ORG (organization) tags in the IOB2 format¹². This dataset serves as a valuable resource for training and evaluating named entity recognition models across various languages.
65 PAPERS • 3 BENCHMARKS
PolyNews is a multilingual dataset containing news titles in 77 languages and 19 scripts.
1 PAPER • NO BENCHMARKS YET
PolyNews is a multilingual parallel dataset containing news titles 833 language pairs, spanning in 64 languages and 17 scripts.