Datasets > Modality > Texts > IndicNLP Corpus

The IndicNLP corpus is a large-scale, general-domain corpus containing 2.7 billion words for 10 Indian languages from two language families.

Source: https://arxiv.org/abs/2005.00085

License

  • Unknown

Modalities

Languages

Tasks

Similar Datasets