no code implementations • ICON 2021 • Aloka Fernando, Gihan Dias
The word frequency list and the verified word list are the largest collections of words lists that are available for the Sinhala language.
no code implementations • PAIL (ICON) 2021 • Dineskumar Murugesapillai, Anankan Ravinthirarasa, Gihan Dias, Kengatharaiyer Sarveswaran
This paper describes an ongoing development of a grammar error checker for the Tamil language using a state-of-the-art deep neural-based approach.
no code implementations • 7 Jul 2021 • Upuli Liyanapathirana, Kaumini Gunasinghe, Gihan Dias
The errors in a corpus of Sinhala documents were analysed and commonly misspelled words and types of common errors were identified.
2 code implementations • ICON 2020 • Kengatharaiyer Sarveswaran, Gihan Dias
ThamizhiUDp uses Stanza for tokenisation and lemmatisation, ThamizhiPOSt and ThamizhiMorph for generating Part of Speech (POS) and Morphological annotations, and uuparser with multilingual training for dependency parsing.
no code implementations • 5 Nov 2020 • Aloka Fernando, Surangika Ranathunga, Gihan Dias
This paper focuses on data augmentation techniques where bilingual lexicon terms are expanded based on case-markers with the objective of generating new words, to be used in Statistical machine Translation (SMT).
no code implementations • WS 2019 • Kengatharaiyer Sarveswaran, Gihan Dias, Miriam Butt
This paper describes a new and larger coverage Finite-State Morphological Analyser (FSM) and Generator for the Dravidian language Tamil.
no code implementations • WS 2016 • Jcs Kadupitiya, Surangika Ranathunga, Gihan Dias
Currently, corpus based-similarity, string-based similarity, and knowledge-based similarity techniques are used to compare short phrases.
no code implementations • WS 2016 • Fern, S o, areka, Surangika Ranathunga, Sanath Jayasena, Gihan Dias
This paper presents a new comprehensive multi-level Part-Of-Speech tag set and a Support Vector Machine based Part-Of-Speech tagger for the Sinhala language.
no code implementations • WS 2016 • Riyafa Abdul Hameed, Nadeeshani Pathirennehelage, Anusha Ihalapathirana, Maryam Ziyad Mohamed, Surangika Ranathunga, Sanath Jayasena, Gihan Dias, Fern, S o, areka
A sentence aligned parallel corpus is an important prerequisite in statistical machine translation.