NusaCrowd

Introduced by Cahyawijaya et al. in NusaCrowd: Open Source Initiative for Indonesian NLP Resources

NusaCrowd is a collaborative initiative to collect and unite existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, the authors have has brought together 137 datasets and 117 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their effectiveness has been demonstrated in multiple experiments.

Source: https://arxiv.org/pdf/2212.09648v2.pdf

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets