1 code implementation • ACL (GEM) 2021 • Figen Beken Fikri, Kemal Oflazer, Berrin Yanikoglu
To achieve this, we translated the English STSb dataset into Turkish and presented the first semantic textual similarity dataset for Turkish as well.
no code implementations • 1 Jul 2024 • Abrar Abir, Kemal Oflazer
This paper investigates the optimization of propaganda technique detection in Arabic text, including tweets \& news paragraphs, from ArAIEval shared task 1.
no code implementations • 23 Oct 2023 • Leonie Weissweiler, Valentin Hofmann, Anjali Kantharuban, Anna Cai, Ritam Dutt, Amey Hengle, Anubha Kabra, Atharva Kulkarni, Abhishek Vijayakumar, Haofei Yu, Hinrich Schütze, Kemal Oflazer, David R. Mortensen
Large language models (LLMs) have recently reached an impressive level of linguistic capability, prompting comparisons with human language skills.
no code implementations • 9 Oct 2018 • Arun Pandian, Lamana Mulaffer, Kemal Oflazer, Amna AlZeyara
This paper presents a neural network classifier approach to detecting both within- and cross- document event coreference effectively using only event mention based features.
coreference-resolution
Cross Document Coreference Resolution
+2
no code implementations • LREC 2018 • Ossama Obeid, Salam Khalifa, Nizar Habash, Houda Bouamor, Wajdi Zaghouani, Kemal Oflazer
In this paper, we introduce MADARi, a joint morphological annotation and spelling correction system for texts in Standard and Dialectal Arabic.
no code implementations • WS 2016 • Wajdi Zaghouani, Abdelati Hawwari, Sawsan Alqahtani, Houda Bouamor, Mahmoud Ghoneim, Mona Diab, Kemal Oflazer
Arabic writing is typically underspecified for short vowels and other markups, referred to as diacritics.
no code implementations • LREC 2016 • Wajdi Zaghouani, Nizar Habash, Ossama Obeid, Behrang Mohit, Houda Bouamor, Kemal Oflazer
We present our guidelines and annotation procedure to create a human corrected machine translated post-edited corpus for the Modern Standard Arabic.
no code implementations • LREC 2016 • Wajdi Zaghouani, Houda Bouamor, Abdelati Hawwari, Mona Diab, Ossama Obeid, Mahmoud Ghoneim, Sawsan Alqahtani, Kemal Oflazer
This paper presents the annotation guidelines developed as part of an effort to create a large scale manually diacritized corpus for various Arabic text genres.
no code implementations • LREC 2014 • Wajdi Zaghouani, Behrang Mohit, Nizar Habash, Ossama Obeid, Nadi Tomeh, Alla Rozovskaya, Noura Farra, Sarah Alkuhlani, Kemal Oflazer
Finally, we present the annotation tool that was developed as part of this project, the annotation pipeline, and the quality of the resulting annotations.
no code implementations • LREC 2014 • Ahmed Salama, Houda Bouamor, Behrang Mohit, Kemal Oflazer
This paper presents YOUDACC, an automatically annotated large-scale multi-dialectal Arabic corpus collected from user comments on Youtube videos.
no code implementations • LREC 2014 • Houda Bouamor, Nizar Habash, Kemal Oflazer
The daily spoken variety of Arabic is often termed the colloquial or dialect form of Arabic.
no code implementations • LREC 2012 • Emad Mohamed, Behrang Mohit, Kemal Oflazer
Using a per letter classification scheme in which each letter is classified as either a segment boundary or not, and using a memory-based classifier, with only word-internal context, prove effective and achieve a 92{\%} exact match accuracy at the word level.