2 code implementations • 30 Apr 2020 • Anoop Kunchukuttan, Divyanshu Kakwani, Satish Golla, Gokul N. C., Avik Bhattacharyya, Mitesh M. Khapra, Pratyush Kumar
We present the IndicNLP corpus, a large-scale, general-domain corpus containing 2. 7 billion words for 10 Indian languages from two language families.