no code implementations • 3 Nov 2024 • Tymaa Hammouda, Mustafa Jarrar, Mohammed Khalilia
We introduce SinaTools, an open-source Python package for Arabic natural language processing and understanding.
no code implementations • 6 Oct 2024 • Bashar Talafha, Karima Kadaoui, Samar Mohamed Magdy, Mariem Habiboullah, Chafei Mohamed Chafei, Ahmed Oumar El-Shangiti, Hiba Zayed, Mohamedou cheikh tourad, Rahaf Alhamouri, Rwaa Assi, Aisha Alraeesi, Hour Mohamed, Fakhraddin Alwajih, Abdelrahman Mohamed, Abdellah El Mekki, El Moatez Billah Nagoudi, Benelhadj Djelloul Mama Saadia, Hamzah A. Alsayadi, Walid Al-Dhabyani, Sara Shatnawi, Yasir Ech-Chammakhy, Amal Makouar, Yousra Berrachedi, Mustafa Jarrar, Shady Shehata, Ismail Berrada, Muhammad Abdul-Mageed
In spite of the recent progress in speech processing, the majority of world languages and dialects remain uncovered.
no code implementations • 30 Jul 2024 • Alaa Aljabari, Lina Duaibes, Mustafa Jarrar, Mohammed Khalilia
Additionally, we propose a novel method for event relation extraction using BERT, in which we treat the task as text entailment.
no code implementations • 30 Jul 2024 • Mohammed Khalilia, Sanad Malaysha, Reem Suwaileh, Mustafa Jarrar, Alaa Aljabari, Tamer Elsayed, Imed Zitouni
This paper presents an overview of the Arabic Natural Language Understanding (ArabicNLU 2024) shared task, focusing on two subtasks: Word Sense Disambiguation (WSD) and Location Mention Disambiguation (LMD).
no code implementations • 25 Jul 2024 • Wajdi Zaghouani, Mustafa Jarrar, Nizar Habash, Houda Bouamor, Imed Zitouni, Mona Diab, Samhaa R. El-Beltagy, Muhammed AbuOdeh
The shared task addresses bias and propaganda annotation in multilingual news posts.
no code implementations • 13 Jul 2024 • Sanad Malaysha, Mo El-Haj, Saad Ezzini, Mohammed Khalilia, Mustafa Jarrar, Sultan Almujaiwel, Ismail Berrada, Houda Bouamor
Specifically, 11 teams participated in Subtask 1, while only 1 team participated in Subtask 2.
no code implementations • 13 Jul 2024 • Mustafa Jarrar, Nagham Hamad, Mohammed Khalilia, Bashar Talafha, AbdelRahim Elmadany, Muhammad Abdul-Mageed
The winning teams achieved F-1 scores of 91% and 92% in the Flat Fine-Grained and Nested Fine-Grained Subtasks, respectively.
no code implementations • 12 Jul 2024 • Lina Duaibes, Areej Jaber, Mustafa Jarrar, Ahmad Qadi, Mais Qandeel
The proliferation of bias and propaganda on social media is an increasingly significant concern, leading to the development of techniques for automatic detection.
no code implementations • 6 Jun 2024 • Mustafa Jarrar, Tymaa Hammouda
Compared with other lexicons, Qabas stands as the most extensive Arabic lexicon, encompassing about 58K lemmas (45K nominal lemmas, 12. 5K verbal lemmas, and 473 functional-word lemmas).
no code implementations • 6 Jun 2024 • Sylvio Barbon Junior, Paolo Ceravolo, Sven Groppe, Mustafa Jarrar, Samira Maghool, Florence Sèdes, Soror Sahri, Maurice van Keulen
A Language Model is a term that encompasses various types of models designed to understand and generate human communication.
no code implementations • 1 May 2024 • Sanad Malaysha, Mustafa Jarrar, Mohammed Khalilia
Semantic textual relatedness is a broader concept of semantic similarity.
no code implementations • 29 Oct 2023 • Mustafa Jarrar, Ahmet Birim, Mohammed Khalilia, Mustafa Erden, Sana Ghanem
This paper presents the ArBanking77, a large Arabic dataset for intent detection in the banking domain.
no code implementations • 29 Oct 2023 • Mustafa Jarrar, Sanad Malaysha, Tymaa Hammouda, Mohammed Khalilia
To establish a Word Sense Disambiguation baseline using our SALMA corpus, we developed an end-to-end Word Sense Disambiguation system using Target Sense Verification.
no code implementations • 26 Oct 2023 • Haneen Liqreina, Mustafa Jarrar, Mohammed Khalilia, Ahmed Oumar El-Shangiti, Muhammad Abdul-Mageed
To compute the baselines of WojoodF ine, we fine-tune three pre-trained Arabic BERT encoders in three settings: flat NER, nested NER and nested NER with subtypes and achieved F1 score of 0. 920, 0. 866, and 0. 885, respectively.
no code implementations • 26 Oct 2023 • Amal Nayouf, Tymaa Hammouda, Mustafa Jarrar, Fadi Zaraket, Mohamad-Bassam Kurdy
This paper presents Nabra, a corpora of Syrian Arabic dialects with morphological annotations.
no code implementations • 24 Oct 2023 • Mustafa Jarrar, Muhammad Abdul-Mageed, Mohammed Khalilia, Bashar Talafha, AbdelRahim Elmadany, Nagham Hamad, Alaa' Omar
The winning teams achieved F1 scores of 91. 96 and 93. 73 in FlatNER and NestedNER, respectively.
1 code implementation • 6 Sep 2023 • Nagham Hamad, Mustafa Jarrar, Mohammad Khalilia, Nadim Nashif
Fine-tuning AlephBERT on our data and testing on D_OLaH yields 69% accuracy, while fine-tuning on D_OLaH and testing on our data yields 57% accuracy, which may be an indication to the generalizability our data offers.
no code implementations • 6 Feb 2023 • Sanad Malaysha, Mustafa Jarrar, Mohammed Khalilia
The most common semantically-labeled dataset for Arabic is the ArabGlossBERT, a relatively small dataset that consists of 167K context-gloss pairs (about 60K positive and 107K negative pairs), collected from Arabic dictionaries.
1 code implementation • 4 Feb 2023 • Sana Ghanem, Mustafa Jarrar, Radi Jarrar, Ibrahim Bounhas
Our proposed algorithm extracts synonyms from existing lexicons and computes a fuzzy value for each candidate.
no code implementations • 13 Dec 2022 • Mustafa Jarrar, Fadi A Zaraket, Tymaa Hammouda, Daanish Masood Alavi, Martin Waahlisch
This article presents morphologically-annotated Yemeni, Sudanese, Iraqi, and Libyan Arabic dialects Lisan corpora.
no code implementations • 20 May 2022 • Eman Naser-Karajah, Nabil Arman, Mustafa Jarrar
The third approach is to construct new WordNets by exploring synonymy graphs, and the fourth approach is to find similar words from corpora using Deep Learning methods, such as word embeddings and recently BERT models.
1 code implementation • LREC 2022 • Mustafa Jarrar, Mohammed Khalilia, Sana Ghanem
This paper presents Wojood, a corpus for Arabic nested Named Entity Recognition (NER).
no code implementations • LREC 2022 • Karim El Haff, Mustafa Jarrar, Tymaa Hammouda, Fadi Zaraket
This is due to many factors, including the complex and rich morphology of Arabic, its high degree of ambiguity, and the presence of several regional varieties that need to be processed while taking into account their unique characteristics.
no code implementations • RANLP 2021 • Moustafa Al-Hajj, Mustafa Jarrar
First, we constructed a dataset of labeled Arabic context-gloss pairs (~167k pairs) we extracted from the Arabic Ontology and the large lexicographic database available at Birzeit University.
no code implementations • 19 May 2022 • Mustafa Jarrar
The ontology provides a formal representation of the concepts that the Arabic terms convey, and its content was built with ontological analysis in mind, and benchmarked to scientific advances and rigorous knowledge sources as much as this is possible, rather than to only speakers' beliefs as lexicons typically are.
no code implementations • SEMEVAL 2021 • Moustafa Al-Hajj, Mustafa Jarrar
This paper presents a set of experiments to evaluate and compare between the performance of using CBOW Word2Vec and Lemma2Vec models for Arabic Word-in-Context (WiC) disambiguation without using sense inventories or sense embeddings.
no code implementations • EACL (GWC) 2021 • Mustafa Jarrar, Eman Karajah, Muhammad Khalifa, Khaled Shaalan
We present our progress in developing a novel algorithm to extract synonyms from bilingual dictionaries.
no code implementations • 25 Nov 2020 • Kareem Darwish, Nizar Habash, Mourad Abbas, Hend Al-Khalifa, Huseein T. Al-Natsheh, Samhaa R. El-Beltagy, Houda Bouamor, Karim Bouzoubaa, Violetta Cavalli-Sforza, Wassim El-Hajj, Mustafa Jarrar, Hamdy Mubarak
The term natural language refers to any system of symbolic communication (spoken, signed or written) without intentional human planning and design.