no code implementations • LREC 2022 • Missaka Herath, Kushan Chamindu, Hashan Maduwantha, Surangika Ranathunga
In this paper, we present a student feedback corpus, which contains 3000 instances of feedback written by university students.
no code implementations • RANLP 2021 • Shanaka Chathuranga, Surangika Ranathunga
A major challenge in analysing social me-dia data belonging to languages that use non-English script is its code-mixed nature.
1 code implementation • LREC 2022 • Vinura Dhananjaya, Piyumal Demotte, Surangika Ranathunga, Sanath Jayasena
We test on a set of different Sinhala text classification tasks and our analysis shows that out of the pre-trained multilingual models that include Sinhala (XLM-R, LaBSE, and LASER), XLM-R is the best model by far for Sinhala text classification.
no code implementations • 25 Mar 2024 • Bastin Tony Roy Savarimuthu, Surangika Ranathunga, Stephen Cranefield
This paper thus aims to foster collaboration between MAS, NLP and LLM researchers in order to advance the field of normative agents.
1 code implementation • 12 Feb 2024 • Surangika Ranathunga, Nisansa de Silva, Menan Velayuthan, Aloka Fernando, Charitha Rathnayake
We conducted a detailed analysis on the quality of web-mined corpora for two low-resource languages (making three language pairs, English-Sinhala, English-Tamil and Sinhala-Tamil).
1 code implementation • 2 Jun 2023 • Shravan Nayak, Surangika Ranathunga, Sarubi Thillainathan, Rikki Hung, Anthony Rinaldi, Yining Wang, Jonah Mackey, Andrew Ho, En-Shiun Annie Lee
In this paper, we show that intermediate-task fine-tuning (ITFT) of PMSS models is extremely beneficial for domain-specific NMT, especially when target domain data is limited/unavailable and the considered languages are missing or under-represented in the PMSS model.
1 code implementation • 16 Oct 2022 • Surangika Ranathunga, Nisansa de Silva
Using an existing language categorisation based on speaker population and vitality, we analyse the distribution of language data resources, amount of NLP/CL research, inclusion in multilingual web-based platforms and the inclusion in pre-trained multilingual models.
no code implementations • 16 Aug 2022 • Vinura Dhananjaya, Piyumal Demotte, Surangika Ranathunga, Sanath Jayasena
We test on a set of different Sinhala text classification tasks and our analysis shows that out of the pre-trained multilingual models that include Sinhala (XLM-R, LaBSE, and LASER), XLM-R is the best model by far for Sinhala text classification.
no code implementations • 18 May 2022 • Aloka Fernando, Surangika Ranathunga
However, existing DA techniques have addressed only one of these OOV types and limit to considering either syntactic constraints or semantic constraints.
no code implementations • Findings (ACL) 2022 • En-Shiun Annie Lee, Sarubi Thillainathan, Shravan Nayak, Surangika Ranathunga, David Ifeoluwa Adelani, Ruisi Su, Arya D. McCarthy
What can pre-trained multilingual sequence-to-sequence models like mBART contribute to translating low-resource languages?
no code implementations • 14 Feb 2022 • Isuru Boyagane, Oshadha Katulanda, Surangika Ranathunga, Srinath Perera
Computer system log data is commonly used in system monitoring, performance characteristic investigation, workflow modeling and anomaly detection.
no code implementations • 10 Sep 2021 • Piyumal Demotte, Surangika Ranathunga
Thus, they could be considered as a viable alternative for text classification for languages that do not have pre-trained contextual embedding models.
no code implementations • RANLP 2021 • Charith Rajitha, Lakmali Piyarathne, Dilan Sachintha, Surangika Ranathunga
Document alignment techniques based on multilingual sentence representations have recently shown state of the art results.
no code implementations • 29 Jun 2021 • Surangika Ranathunga, En-Shiun Annie Lee, Marjana Prifti Skenduli, Ravi Shekhar, Mehreen Alam, Rishemjit Kaur
Neural Machine Translation (NMT) has seen a tremendous spurt of growth in less than ten years, and has already entered a mature phase.
no code implementations • 12 Jun 2021 • Dilan Sachintha, Lakmali Piyarathna, Charith Rajitha, Surangika Ranathunga
This paper presents a weighting mechanism that makes use of available small-scale parallel corpora to improve the performance of multilingual sentence representations on document and sentence alignment.
1 code implementation • 14 Nov 2020 • Lahiru Senevirathne, Piyumal Demotte, Binod Karunanayake, Udyogi Munasinghe, Surangika Ranathunga
For sentiment analysis, there exists only two previous research with deep learning approaches, which focused only on document-level sentiment analysis for the binary case.
no code implementations • 5 Nov 2020 • Aloka Fernando, Surangika Ranathunga, Gihan Dias
This paper focuses on data augmentation techniques where bilingual lexicon terms are expanded based on case-markers with the objective of generating new words, to be used in Statistical machine Translation (SMT).
no code implementations • LREC 2020 • Dimuthu Lakmal, Surangika Ranathunga, Saman Peramuna, Indu Herath
This paper presents the first ever comprehensive evaluation of different types of word embeddings for Sinhala language.
no code implementations • LREC 2020 • Vijini Liyanage, Surangika Ranathunga
A Mathematical Word Problem (MWP) differs from a general textual representation due to the fact that it is comprised of numerical quantities and units, in addition to text.
no code implementations • 19 Nov 2019 • Vijini Liyanage, Surangika Ranathunga
Existing approaches for automatically generating mathematical word problems are deprived of customizability and creativity due to the inherent nature of template-based mechanisms they employ.
no code implementations • ACL 2019 • Yohan Karunanayake, Uthayasanker Thayasivam, Surangika Ranathunga
Current state-of-the-art speech-based user interfaces use data intense methodologies to recognize free-form speech commands.
no code implementations • WS 2016 • Jcs Kadupitiya, Surangika Ranathunga, Gihan Dias
Currently, corpus based-similarity, string-based similarity, and knowledge-based similarity techniques are used to compare short phrases.
no code implementations • WS 2016 • Fern, S o, areka, Surangika Ranathunga, Sanath Jayasena, Gihan Dias
This paper presents a new comprehensive multi-level Part-Of-Speech tag set and a Support Vector Machine based Part-Of-Speech tagger for the Sinhala language.
no code implementations • WS 2016 • Riyafa Abdul Hameed, Nadeeshani Pathirennehelage, Anusha Ihalapathirana, Maryam Ziyad Mohamed, Surangika Ranathunga, Sanath Jayasena, Gihan Dias, Fern, S o, areka
A sentence aligned parallel corpus is an important prerequisite in statistical machine translation.