1 code implementation • 13 Sep 2021 • Charangan Vasantharajan, Laksika Tharmalingam, Uthayasanker Thayasivam
Since Tamil and Sinhala are Low-Resource Languages, we improved the performance of Tesseract by employing LSTM-based training on more than 20 legacy fonts to recognize printed characters in these languages.
1 code implementation • 24 Aug 2021 • Charangan Vasantharajan, Uthayasanker Thayasivam
The experimental results showed that ULMFiT is the best model for this task.
no code implementations • EACL (DravidianLangTech) 2021 • Charangan Vasantharajan, Uthayasanker Thayasivam
Code-Mixed Offensive contents are used pervasively in social media posts in the last few years.
no code implementations • 18 Nov 2021 • Bharathi Raja Chakravarthi, Ruba Priyadharshini, Sajeetha Thavareesan, Dhivya Chinnappa, Durairaj Thenmozhi, Elizabeth Sherly, John P. McCrae, Adeep Hande, Rahul Ponnusamy, Shubhanker Banerjee, Charangan Vasantharajan
We received 22 systems for Tamil-English, 15 systems for Malayalam-English, and 15 for Kannada-English.
no code implementations • 9 Feb 2022 • Charangan Vasantharajan, Sean Benhur, Prasanna Kumar Kumarasen, Rahul Ponnusamy, Sathiyaraj Thangasamy, Ruba Priyadharshini, Thenmozhi Durairaj, Kanchana Sivanraju, Anbukkarasi Sampath, Bharathi Raja Chakravarthi, John Phillip McCrae
Our MURIL-base model has achieved a 0. 60 macro average F1-score across our 3-class group dataset.