Search Results for author: Sina Ahmadi

Found 28 papers, 17 papers with code

Building a Lemmatizer and a Spell-checker for Sorani Kurdish

no code implementations27 Sep 2018 Shahin Salavati, Sina Ahmadi

The present paper aims at presenting a lemmatization and a word-level error correction system for Sorani Kurdish.

Language Modelling Lemmatization

Learning Noun Cases Using Sequential Neural Networks

no code implementations9 Oct 2018 Sina Ahmadi

Morphological declension, which aims to inflect nouns to indicate number, case and gender, is an important task in natural language processing (NLP).

Sentence

A Rule-based Kurdish Text Transliteration System

1 code implementation26 Nov 2018 Sina Ahmadi

In this article, we present a rule-based approach for transliterating two mostly used orthographies in Sorani Kurdish.

Transliteration

Defying Wikidata: Validation of Terminological Relations in the Web of Data

1 code implementation LREC 2020 Patricia Mart{\'\i}n-Chozas, Sina Ahmadi, Elena Montiel-Ponsoda

In this paper we present an approach to validate terminological data retrieved from open encyclopaedic knowledge bases.

Challenges of Word Sense Alignment: Portuguese Language Resources

no code implementations LREC 2020 Ana Salgado, Sina Ahmadi, Alberto Sim{\~o}es, John Philip McCrae, Rute Costa

Word sense alignment involves searching for matching senses within dictionary entries of different lexical resources and linking them, which poses significant challenges.

A Corpus of the Sorani Kurdish Folkloric Lyrics

1 code implementation LREC 2020 Sina Ahmadi, Hossein Hassani, Kamaladdin Abedi

We believe that this corpus contributes to Kurdish language processing in several ways, such as compensation for the lack of a long history of written text by incorporating oral literature, presenting an unexplored realm in Kurdish language processing, and assisting the initiation of Kurdish computational folkloristics.

Attribute

Leveraging Multilingual News Websites for Building a Kurdish Parallel Corpus

1 code implementation4 Oct 2020 Sina Ahmadi, Hossein Hassani, Daban Q. Jaff

We present a corpus containing 12, 327 translation pairs in the two major dialects of Kurdish, Sorani and Kurmanji.

Translation Transliteration

A Formal Description of Sorani Kurdish Morphology

no code implementations8 Sep 2021 Sina Ahmadi

Sorani Kurdish, also known as Central Kurdish, has a complex morphology, particularly due to the patterns in which morphemes appear.

Morphological Analysis

Hunspell for Sorani Kurdish Spell Checking and Morphological Analysis

1 code implementation14 Sep 2021 Sina Ahmadi

Spell checking and morphological analysis are two fundamental tasks in text and natural language processing and are addressed in the early stages of the development of language technology.

Morphological Analysis

Monolingual alignment of word senses and definitions in lexicographical resources

no code implementations6 Sep 2022 Sina Ahmadi

This is a challenging task, especially due to differences in sense granularity, coverage and description in two resources.

Semantic Similarity Semantic Textual Similarity +1

PALI: A Language Identification Benchmark for Perso-Arabic Scripts

1 code implementation3 Apr 2023 Sina Ahmadi, Milind Agarwal, Antonios Anastasopoulos

The Perso-Arabic scripts are a family of scripts that are widely adopted and used by various linguistic communities around the globe.

Language Identification

Approaches to Corpus Creation for Low-Resource Language Technology: the Case of Southern Kurdish and Laki

1 code implementation3 Apr 2023 Sina Ahmadi, Zahra Azin, Sara Belelli, Antonios Anastasopoulos

One of the major challenges that under-represented and endangered language communities face in language technology is the lack or paucity of language data.

Language Identification

Transfer Learning for Low-Resource Sentiment Analysis

1 code implementation10 Apr 2023 Razhan Hameed, Sina Ahmadi, Fatemeh Daneshfar

Sentiment analysis is the process of identifying and extracting subjective information from text.

Data Augmentation Sentiment Analysis +1

Script Normalization for Unconventional Writing of Under-Resourced Languages in Bilingual Communities

1 code implementation25 May 2023 Sina Ahmadi, Antonios Anastasopoulos

The wide accessibility of social media has provided linguistically under-represented communities with an extraordinary opportunity to create content in their native languages.

Language Identification Machine Translation

CODET: A Benchmark for Contrastive Dialectal Evaluation of Machine Translation

no code implementations26 May 2023 Md Mahfuz ibn Alam, Sina Ahmadi, Antonios Anastasopoulos

Neural machine translation (NMT) systems exhibit limited robustness in handling source-side linguistic variations.

Machine Translation NMT +1

A Morphologically-Aware Dictionary-based Data Augmentation Technique for Machine Translation of Under-Represented Languages

no code implementations2 Feb 2024 Md Mahfuz ibn Alam, Sina Ahmadi, Antonios Anastasopoulos

In this paper, we propose strategies to synthesize parallel data relying on morpho-syntactic information and using bilingual lexicons along with a small amount of seed parallel data.

Data Augmentation Machine Translation

Language and Speech Technology for Central Kurdish Varieties

1 code implementation4 Mar 2024 Sina Ahmadi, Daban Q. Jaff, Md Mahfuz ibn Alam, Antonios Anastasopoulos

Kurdish, an Indo-European language spoken by over 30 million speakers, is considered a dialect continuum and known for its diversity in language varieties.

Automatic Speech Recognition Language Identification +3

A Tokenization System for the Kurdish Language

1 code implementation VarDial (COLING) 2020 Sina Ahmadi

We demonstrate how the morphological complexity of the language along with the lack of a unified orthography can be efficiently addressed in tokenization.

Building a Corpus for the Zaza–Gorani Language Family

1 code implementation VarDial (COLING) 2020 Sina Ahmadi

The Zaza–Gorani language family is a linguistic subgroup of the Northwestern Iranian languages for which there is no significant corpus available.

KLPT – Kurdish Language Processing Toolkit

1 code implementation EMNLP (NLPOSS) 2020 Sina Ahmadi

Despite the recent advances in applying language-independent approaches to various natural language processing tasks thanks to artificial intelligence, some language-specific tools are still essential to process a language in a viable manner.

Lemmatization Transliteration

Cross-Lingual Link Discovery for Under-Resourced Languages

no code implementations LREC 2022 Michael Rosner, Sina Ahmadi, Elena-Simona Apostol, Julia Bosque-Gil, Christian Chiarcos, Milan Dojchinovski, Katerina Gkirtzou, Jorge Gracia, Dagmar Gromann, Chaya Liebeskind, Giedrė Valūnaitė Oleškevičienė, Gilles Sérasset, Ciprian-Octavian Truică

In this paper, we provide an overview of current technologies for cross-lingual link discovery, and we discuss challenges, experiences and prospects of their application to under-resourced languages.

Cannot find the paper you are looking for? You can Submit a new open access paper.