The IndicNLP corpus is a large-scale, general-domain corpus containing 2.7 billion words for 10 Indian languages from two language families.
3 PAPERS • NO BENCHMARKS YET