Search Results for author: Gihan Dias

ThamizhiUDp uses Stanza for tokenisation and lemmatisation, ThamizhiPOSt and ThamizhiMorph for generating Part of Speech (POS) and Morphological annotations, and uuparser with multilingual training for dependency parsing.

Dependency Parsing POS +1

Paper
Code

Data Augmentation and Terminology Integration for Domain-Specific Sinhala-English-Tamil Statistical Machine Translation

no code implementations • 5 Nov 2020 • Aloka Fernando, Surangika Ranathunga, Gihan Dias

This paper focuses on data augmentation techniques where bilingual lexicon terms are expanded based on case-markers with the objective of generating new words, to be used in Statistical machine Translation (SMT).

Data Augmentation Machine Translation +1

Paper
Add Code

Using Meta-Morph Rules to develop Morphological Analysers: A case study concerning Tamil

no code implementations • WS 2019 • Kengatharaiyer Sarveswaran, Gihan Dias, Miriam Butt

This paper describes a new and larger coverage Finite-State Morphological Analyser (FSM) and Generator for the Dravidian language Tamil.

MORPH Morphological Analysis

Paper
Add Code

Improving domain-specific SMT for low-resourced languages using data from different domains

no code implementations • LREC 2018 • Fathima Farhath, Pranavan Theivendiram, Surangika Ranathunga, Sanath Jayasena, Gihan Dias

Domain Adaptation Language Modelling +1

Paper
Add Code

Sinhala Word Joiner

no code implementations • WS 2017 • Rajith Priyanga, Surangika Ranatunga, Gihan Dias

Paper
Add Code

Sinhala Short Sentence Similarity Calculation using Corpus-Based and Knowledge-Based Similarity Measures

no code implementations • WS 2016 • Jcs Kadupitiya, Surangika Ranathunga, Gihan Dias

Currently, corpus based-similarity, string-based similarity, and knowledge-based similarity techniques are used to compare short phrases.

Paper
Add Code

Comprehensive Part-Of-Speech Tag Set and SVM based POS Tagger for Sinhala

no code implementations • WS 2016 • Fern, S o, areka, Surangika Ranathunga, Sanath Jayasena, Gihan Dias

This paper presents a new comprehensive multi-level Part-Of-Speech tag set and a Support Vector Machine based Part-Of-Speech tagger for the Sinhala language.

POS TAG

Paper
Add Code

Automatic Creation of a Sentence Aligned Sinhala-Tamil Parallel Corpus

no code implementations • WS 2016 • Riyafa Abdul Hameed, Nadeeshani Pathirennehelage, Anusha Ihalapathirana, Maryam Ziyad Mohamed, Surangika Ranathunga, Sanath Jayasena, Gihan Dias, Fern, S o, areka

A sentence aligned parallel corpus is an important prerequisite in statistical machine translation.

Machine Translation Sentence +2

Paper
Add Code

Building a WordNet for Sinhala

no code implementations • WS 2014 • Indeewari Wijesiri, Malaka Gallage, Buddhika Gunathilaka, Madhuranga Lakjeewa, Daya Wimalasuriya, Gihan Dias, Rohini Paranavithana, Nisansa de Silva

Information Retrieval Word Sense Disambiguation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.