no code implementations • 29 Oct 2023 • Mustafa Jarrar, Sanad Malaysha, Tymaa Hammouda, Mohammed Khalilia
To establish a Word Sense Disambiguation baseline using our SALMA corpus, we developed an end-to-end Word Sense Disambiguation system using Target Sense Verification.
no code implementations • 29 Oct 2023 • Mustafa Jarrar, Ahmet Birim, Mohammed Khalilia, Mustafa Erden, Sana Ghanem
This paper presents the ArBanking77, a large Arabic dataset for intent detection in the banking domain.
no code implementations • 26 Oct 2023 • Amal Nayouf, Tymaa Hammouda, Mustafa Jarrar, Fadi Zaraket, Mohamad-Bassam Kurdy
This paper presents Nabra, a corpora of Syrian Arabic dialects with morphological annotations.
no code implementations • 26 Oct 2023 • Haneen Liqreina, Mustafa Jarrar, Mohammed Khalilia, Ahmed Oumar El-Shangiti, Muhammad Abdul-Mageed
To compute the baselines of WojoodF ine, we fine-tune three pre-trained Arabic BERT encoders in three settings: flat NER, nested NER and nested NER with subtypes and achieved F1 score of 0. 920, 0. 866, and 0. 885, respectively.
no code implementations • 24 Oct 2023 • Mustafa Jarrar, Muhammad Abdul-Mageed, Mohammed Khalilia, Bashar Talafha, AbdelRahim Elmadany, Nagham Hamad, Alaa' Omar
The winning teams achieved F1 scores of 91. 96 and 93. 73 in FlatNER and NestedNER, respectively.
1 code implementation • 6 Sep 2023 • Nagham Hamad, Mustafa Jarrar, Mohammad Khalilia, Nadim Nashif
Fine-tuning AlephBERT on our data and testing on D_OLaH yields 69% accuracy, while fine-tuning on D_OLaH and testing on our data yields 57% accuracy, which may be an indication to the generalizability our data offers.
no code implementations • 6 Feb 2023 • Sanad Malaysha, Mustafa Jarrar, Mohammed Khalilia
The most common semantically-labeled dataset for Arabic is the ArabGlossBERT, a relatively small dataset that consists of 167K context-gloss pairs (about 60K positive and 107K negative pairs), collected from Arabic dictionaries.
1 code implementation • 4 Feb 2023 • Sana Ghanem, Mustafa Jarrar, Radi Jarrar, Ibrahim Bounhas
Our proposed algorithm extracts synonyms from existing lexicons and computes a fuzzy value for each candidate.
no code implementations • 13 Dec 2022 • Mustafa Jarrar, Fadi A Zaraket, Tymaa Hammouda, Daanish Masood Alavi, Martin Waahlisch
This article presents morphologically-annotated Yemeni, Sudanese, Iraqi, and Libyan Arabic dialects Lisan corpora.
no code implementations • 20 May 2022 • Eman Naser-Karajah, Nabil Arman, Mustafa Jarrar
The third approach is to construct new WordNets by exploring synonymy graphs, and the fourth approach is to find similar words from corpora using Deep Learning methods, such as word embeddings and recently BERT models.
1 code implementation • LREC 2022 • Mustafa Jarrar, Mohammed Khalilia, Sana Ghanem
This paper presents Wojood, a corpus for Arabic nested Named Entity Recognition (NER).
no code implementations • 19 May 2022 • Mustafa Jarrar
The ontology provides a formal representation of the concepts that the Arabic terms convey, and its content was built with ontological analysis in mind, and benchmarked to scientific advances and rigorous knowledge sources as much as this is possible, rather than to only speakers' beliefs as lexicons typically are.
no code implementations • RANLP 2021 • Moustafa Al-Hajj, Mustafa Jarrar
First, we constructed a dataset of labeled Arabic context-gloss pairs (~167k pairs) we extracted from the Arabic Ontology and the large lexicographic database available at Birzeit University.
no code implementations • LREC 2022 • Karim El Haff, Mustafa Jarrar, Tymaa Hammouda, Fadi Zaraket
This is due to many factors, including the complex and rich morphology of Arabic, its high degree of ambiguity, and the presence of several regional varieties that need to be processed while taking into account their unique characteristics.
no code implementations • SEMEVAL 2021 • Moustafa Al-Hajj, Mustafa Jarrar
This paper presents a set of experiments to evaluate and compare between the performance of using CBOW Word2Vec and Lemma2Vec models for Arabic Word-in-Context (WiC) disambiguation without using sense inventories or sense embeddings.
no code implementations • EACL (GWC) 2021 • Mustafa Jarrar, Eman Karajah, Muhammad Khalifa, Khaled Shaalan
We present our progress in developing a novel algorithm to extract synonyms from bilingual dictionaries.
no code implementations • 25 Nov 2020 • Kareem Darwish, Nizar Habash, Mourad Abbas, Hend Al-Khalifa, Huseein T. Al-Natsheh, Samhaa R. El-Beltagy, Houda Bouamor, Karim Bouzoubaa, Violetta Cavalli-Sforza, Wassim El-Hajj, Mustafa Jarrar, Hamdy Mubarak
The term natural language refers to any system of symbolic communication (spoken, signed or written) without intentional human planning and design.