Glot500-c (Glot500 Corpus)

A dataset of natural language data collected by putting together more than 150 existing mono-lingual and multilingual datasets together and crawling known multilingual websites. The focus of this dataset is on 500 extremely low-resource languages.

Github: https://github.com/cisnlp/Glot500

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


License


Modalities


Languages