no code implementations • RANLP 2021 • Shanaka Chathuranga, Surangika Ranathunga
A major challenge in analysing social me-dia data belonging to languages that use non-English script is its code-mixed nature.
1 code implementation • LREC 2022 • Vinura Dhananjaya, Piyumal Demotte, Surangika Ranathunga, Sanath Jayasena
We test on a set of different Sinhala text classification tasks and our analysis shows that out of the pre-trained multilingual models that include Sinhala (XLM-R, LaBSE, and LASER), XLM-R is the best model by far for Sinhala text classification.
no code implementations • LREC 2022 • Missaka Herath, Kushan Chamindu, Hashan Maduwantha, Surangika Ranathunga
In this paper, we present a student feedback corpus, which contains 3000 instances of feedback written by university students.
no code implementations • 28 Mar 2025 • Sarubi Thillainathan, Songchen Yuan, En-Shiun Annie Lee, Sanath Jayasena, Surangika Ranathunga
Our experiments reveal that these approaches enhance translation performance by an average of +1. 47 bilingual evaluation understudy (BLEU) score compared to the standard single-stage fine-tuning baseline across all translation directions.
no code implementations • 10 Jan 2025 • Aloka Fernando, Surangika Ranathunga
In this paper, we introduce a novel masking strategy, Linguistic Entity Masking (LEM) to be used in the continual pre-training step to further improve the cross-lingual representations of existing multiPLMs.
1 code implementation • 31 Dec 2024 • Yomal De Mel, Kasun Wickramasinghe, Nisansa de Silva, Surangika Ranathunga
We propose two methods to address this problem: Our baseline is a rule-based method, which is then compared against our second method where we approach the transliteration problem as a sequence-to-sequence task akin to the established Neural Machine Translation (NMT) task.
no code implementations • 22 Dec 2024 • Charitha Rathnayake, P. R. S. Thilakarathna, Uthpala Nethmini, Rishemjith Kaur, Surangika Ranathunga
A prominent technique in this line is structure-based UBLI.
1 code implementation • 6 Dec 2024 • Thevin Senath, Kumuthu Athukorala, Ransika Costa, Surangika Ranathunga, Rishemjit Kaur
Given that the use of LLMs for this task has been barely done, we carry out an extensive set of experiments to determine the best LLM, prompt, and the fine-tuning setups.
1 code implementation • 3 Dec 2024 • Surangika Ranathunga, Asanka Ranasinghea, Janaka Shamala, Ayodya Dandeniyaa, Rashmi Galappaththia, Malithi Samaraweeraa
This paper presents a multi-way parallel English-Tamil-Sinhala corpus annotated with Named Entities (NEs), where Sinhala and Tamil are low-resource languages.
Low Resource Neural Machine Translation
Low-Resource Neural Machine Translation
+5
1 code implementation • 2 Dec 2024 • Surangika Ranathunga, Rumesh Sirithunga, Himashi Rathnayake, Lahiru De Silva, Thamindu Aluthwala, Saman Peramuna, Ravi Shekhar
Text Simplification is a task that has been minimally explored for low-resource languages.
no code implementations • 18 Oct 2024 • Robert Spencer, Surangika Ranathunga, Mikael Boulic, Andries van Heerden, Teo Susnjak
This study investigates the application of Transfer Learning (TL) on Transformer architectures to enhance building energy consumption forecasting.
1 code implementation • 24 Jun 2024 • Jiangshu Du, Yibo Wang, Wenting Zhao, Zhongfen Deng, Shuaiqi Liu, Renze Lou, Henry Peng Zou, Pranav Narayanan Venkit, Nan Zhang, Mukund Srinath, Haoran Ranran Zhang, Vipul Gupta, Yinghui Li, Tao Li, Fei Wang, Qin Liu, Tianlin Liu, Pengzhi Gao, Congying Xia, Chen Xing, Jiayang Cheng, Zhaowei Wang, Ying Su, Raj Sanjay Shah, Ruohao Guo, Jing Gu, Haoran Li, Kangda Wei, ZiHao Wang, Lu Cheng, Surangika Ranathunga, Meng Fang, Jie Fu, Fei Liu, Ruihong Huang, Eduardo Blanco, Yixin Cao, Rui Zhang, Philip S. Yu, Wenpeng Yin
This study focuses on the topic of LLMs assist NLP Researchers, particularly examining the effectiveness of LLM in assisting paper (meta-)reviewing and its recognizability.
no code implementations • 10 Jun 2024 • Surangika Ranathunga, Nisansa de Silva, Dilith Jayakody, Aloka Fernando
We analysed a sample of NLP research papers archived in ACL Anthology as an attempt to quantify the degree of openness and the benefit of such an open culture in the NLP community.
no code implementations • 8 Apr 2024 • Teo Susnjak, Peter Hwang, Napoleon H. Reyes, Andre L. C. Barczak, Timothy R. McIntosh, Surangika Ranathunga
This study broadens the appeal of AI-enhanced tools across various academic and research fields, setting a new standard for conducting comprehensive and accurate literature reviews with more efficiency in the face of ever-increasing volumes of academic studies.
no code implementations • 5 Apr 2024 • Tong Su, Xin Peng, Sarubi Thillainathan, David Guzmán, Surangika Ranathunga, En-Shiun Annie Lee
Parameter-efficient fine-tuning (PEFT) methods are increasingly vital in adapting large-scale pre-trained language models for diverse tasks, offering a balance between adaptability and computational efficiency.
no code implementations • 25 Mar 2024 • Bastin Tony Roy Savarimuthu, Surangika Ranathunga, Stephen Cranefield
This paper thus aims to foster collaboration between MAS, NLP and LLM researchers in order to advance the field of normative agents.
1 code implementation • 12 Feb 2024 • Surangika Ranathunga, Nisansa de Silva, Menan Velayuthan, Aloka Fernando, Charitha Rathnayake
We conducted a detailed analysis on the quality of web-mined corpora for two low-resource languages (making three language pairs, English-Sinhala, English-Tamil and Sinhala-Tamil).
1 code implementation • 2 Jun 2023 • Shravan Nayak, Surangika Ranathunga, Sarubi Thillainathan, Rikki Hung, Anthony Rinaldi, Yining Wang, Jonah Mackey, Andrew Ho, En-Shiun Annie Lee
In this paper, we show that intermediate-task fine-tuning (ITFT) of PMSS models is extremely beneficial for domain-specific NMT, especially when target domain data is limited/unavailable and the considered languages are missing or under-represented in the PMSS model.
1 code implementation • 16 Oct 2022 • Surangika Ranathunga, Nisansa de Silva
Using an existing language categorisation based on speaker population and vitality, we analyse the distribution of language data resources, amount of NLP/CL research, inclusion in multilingual web-based platforms and the inclusion in pre-trained multilingual models.
no code implementations • 16 Aug 2022 • Vinura Dhananjaya, Piyumal Demotte, Surangika Ranathunga, Sanath Jayasena
We test on a set of different Sinhala text classification tasks and our analysis shows that out of the pre-trained multilingual models that include Sinhala (XLM-R, LaBSE, and LASER), XLM-R is the best model by far for Sinhala text classification.
no code implementations • 18 May 2022 • Aloka Fernando, Surangika Ranathunga
However, existing DA techniques have addressed only one of these OOV types and limit to considering either syntactic constraints or semantic constraints.
no code implementations • Findings (ACL) 2022 • En-Shiun Annie Lee, Sarubi Thillainathan, Shravan Nayak, Surangika Ranathunga, David Ifeoluwa Adelani, Ruisi Su, Arya D. McCarthy
What can pre-trained multilingual sequence-to-sequence models like mBART contribute to translating low-resource languages?
no code implementations • 14 Feb 2022 • Isuru Boyagane, Oshadha Katulanda, Surangika Ranathunga, Srinath Perera
Computer system log data is commonly used in system monitoring, performance characteristic investigation, workflow modeling and anomaly detection.
no code implementations • 10 Sep 2021 • Piyumal Demotte, Surangika Ranathunga
Thus, they could be considered as a viable alternative for text classification for languages that do not have pre-trained contextual embedding models.
no code implementations • RANLP 2021 • Charith Rajitha, Lakmali Piyarathne, Dilan Sachintha, Surangika Ranathunga
Document alignment techniques based on multilingual sentence representations have recently shown state of the art results.
no code implementations • 29 Jun 2021 • Surangika Ranathunga, En-Shiun Annie Lee, Marjana Prifti Skenduli, Ravi Shekhar, Mehreen Alam, Rishemjit Kaur
Neural Machine Translation (NMT) has seen a tremendous spurt of growth in less than ten years, and has already entered a mature phase.
no code implementations • 12 Jun 2021 • Dilan Sachintha, Lakmali Piyarathna, Charith Rajitha, Surangika Ranathunga
This paper presents a weighting mechanism that makes use of available small-scale parallel corpora to improve the performance of multilingual sentence representations on document and sentence alignment.
1 code implementation • 14 Nov 2020 • Lahiru Senevirathne, Piyumal Demotte, Binod Karunanayake, Udyogi Munasinghe, Surangika Ranathunga
For sentiment analysis, there exists only two previous research with deep learning approaches, which focused only on document-level sentiment analysis for the binary case.
no code implementations • 5 Nov 2020 • Aloka Fernando, Surangika Ranathunga, Gihan Dias
This paper focuses on data augmentation techniques where bilingual lexicon terms are expanded based on case-markers with the objective of generating new words, to be used in Statistical machine Translation (SMT).
no code implementations • LREC 2020 • Vijini Liyanage, Surangika Ranathunga
A Mathematical Word Problem (MWP) differs from a general textual representation due to the fact that it is comprised of numerical quantities and units, in addition to text.
no code implementations • LREC 2020 • Dimuthu Lakmal, Surangika Ranathunga, Saman Peramuna, Indu Herath
This paper presents the first ever comprehensive evaluation of different types of word embeddings for Sinhala language.
no code implementations • 19 Nov 2019 • Vijini Liyanage, Surangika Ranathunga
Existing approaches for automatically generating mathematical word problems are deprived of customizability and creativity due to the inherent nature of template-based mechanisms they employ.
no code implementations • ACL 2019 • Yohan Karunanayake, Uthayasanker Thayasivam, Surangika Ranathunga
Current state-of-the-art speech-based user interfaces use data intense methodologies to recognize free-form speech commands.
no code implementations • WS 2016 • Riyafa Abdul Hameed, Nadeeshani Pathirennehelage, Anusha Ihalapathirana, Maryam Ziyad Mohamed, Surangika Ranathunga, Sanath Jayasena, Gihan Dias, Fern, S o, areka
A sentence aligned parallel corpus is an important prerequisite in statistical machine translation.
no code implementations • WS 2016 • Fern, S o, areka, Surangika Ranathunga, Sanath Jayasena, Gihan Dias
This paper presents a new comprehensive multi-level Part-Of-Speech tag set and a Support Vector Machine based Part-Of-Speech tagger for the Sinhala language.
no code implementations • WS 2016 • Jcs Kadupitiya, Surangika Ranathunga, Gihan Dias
Currently, corpus based-similarity, string-based similarity, and knowledge-based similarity techniques are used to compare short phrases.