WikiAnn is a dataset for cross-lingual name tagging and linking based on Wikipedia articles in 295 languages.
25 PAPERS • 7 BENCHMARKS
XL-Sum is a comprehensive and diverse dataset for abstractive summarization comprising 1 million professionally annotated article-summary pairs from BBC, extracted using a set of carefully designed heuristics. The dataset covers 44 languages ranging from low to high-resource, for many of which no public dataset is currently available. XL-Sum is highly abstractive, concise, and of high quality, as indicated by human and intrinsic evaluation.
15 PAPERS • 1 BENCHMARK