The text corpus contains running text from various free licensed sources. - The whole content of Malayalam Wikipedia extracted on January 1, 2019 - News/Article from various sources, source mentioned in respective files: - 251 Mb - 8,60,159 lines - 98,15,533 words - 10,11,11,885 characters
The word corpus contains - Classified lexicon prepared for Malaylam Morphology Analyser project - Unique words extracted from Malayalam Wikipedia, Wictionary etc. - 14,27,392 words
Paper | Code | Results | Date | Stars |
---|