WikiAnn is a dataset for cross-lingual name tagging and linking based on Wikipedia articles in 295 languages.
11 PAPERS • 3 BENCHMARKS
Cherokee-English Parallel Dataset is a low-resource dataset of 14,151 pairs of sentences with around 313K English tokens and 206K Cherokee tokens. The parallel corpus is accompanied by a monolingual Cherokee dataset of 5,120 sentences. Both datasets are mostly derived from Cherokee monolingual books.
1 PAPER • NO BENCHMARKS YET