no code implementations • LREC 2022 • Nurpeiis Baimukan, Houda Bouamor, Nizar Habash
We test the value of such aggregation by building language models and using them in dialect identification.
1 code implementation • GeBNLP (COLING) 2020 • Bashar Alhafni, Nizar Habash, Houda Bouamor
In this paper, we present an approach for sentence-level gender reinflection using linguistically enhanced sequence-to-sequence models.
no code implementations • 22 Oct 2022 • Bashar Alhafni, Nizar Habash, Houda Bouamor, Ossama Obeid, Sultan Alrowili, Daliyah AlZeer, Khawlah M. Alshanqiti, Ahmed ElBakry, Muhammad ElNokrashy, Mohamed Gabr, Abderrahmane Issam, Abdelrahim Qaddoumi, K. Vijay-Shanker, Mahmoud Zyate
In this paper, we present the results and findings of the Shared Task on Gender Rewriting, which was organized as part of the Seventh Arabic Natural Language Processing Workshop.
1 code implementation • 18 Oct 2022 • Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Houda Bouamor, Nizar Habash
We describe findings of the third Nuanced Arabic Dialect Identification Shared Task (NADI 2022).
1 code implementation • NAACL 2022 • Bashar Alhafni, Nizar Habash, Houda Bouamor
In this paper, we define the task of gender rewriting in contexts involving two users (I and/or You) - first and second grammatical persons with independent grammatical gender preferences.
no code implementations • LREC 2022 • Bashar Alhafni, Nizar Habash, Houda Bouamor
Much of the research on this issue has focused on mitigating gender bias in English NLP models and systems.
1 code implementation • EACL (WANLP) 2021 • Go Inoue, Bashar Alhafni, Nurpeiis Baimukan, Houda Bouamor, Nizar Habash
In this paper, we explore the effects of language variants, data sizes, and fine-tuning task types in Arabic pre-trained language models.
1 code implementation • EACL (WANLP) 2021 • Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Houda Bouamor, Nizar Habash
This Shared Task includes four subtasks: country-level Modern Standard Arabic (MSA) identification (Subtask 1. 1), country-level dialect identification (Subtask 1. 2), province-level MSA identification (Subtask 2. 1), and province-level sub-dialect identification (Subtask 2. 2).
no code implementations • 25 Nov 2020 • Kareem Darwish, Nizar Habash, Mourad Abbas, Hend Al-Khalifa, Huseein T. Al-Natsheh, Samhaa R. El-Beltagy, Houda Bouamor, Karim Bouzoubaa, Violetta Cavalli-Sforza, Wassim El-Hajj, Mustafa Jarrar, Hamdy Mubarak
The term natural language refers to any system of symbolic communication (spoken, signed or written) without intentional human planning and design.
no code implementations • COLING (WANLP) 2020 • Muhammad Abdul-Mageed, Chiyu Zhang, Houda Bouamor, Nizar Habash
The data for the shared task covers a total of 100 provinces from 21 Arab countries and are collected from the Twitter domain.
no code implementations • LREC 2020 • Fadhl Eryani, Nizar Habash, Houda Bouamor, Salam Khalifa
In this paper, we present the MADAR CODA Corpus, a collection of 10, 000 sentences from five Arabic city dialects (Beirut, Cairo, Doha, Rabat, and Tunis) represented in the Conventional Orthography for Dialectal Arabic (CODA) in parallel with their raw original form.
no code implementations • WS 2019 • Alex Erdmann, er, Salam Khalifa, Mai Oudah, Nizar Habash, Houda Bouamor
We present de-lexical segmentation, a linguistically motivated alternative to greedy or other unsupervised methods, requiring only minimal language specific input.
no code implementations • WS 2019 • Nizar Habash, Houda Bouamor, Christine Chung
The impressive progress in many Natural Language Processing (NLP) applications has increased the awareness of some of the biases these NLP systems have with regards to gender identities.
no code implementations • WS 2019 • Houda Bouamor, Sabit Hassan, Nizar Habash
In this paper, we present the results and findings of the MADAR Shared Task on Arabic Fine-Grained Dialect Identification.
no code implementations • NAACL 2019 • Ossama Obeid, Mohammad Salameh, Houda Bouamor, Nizar Habash
This demo paper describes ADIDA, a web-based system for automatic dialect identification for Arabic text.
no code implementations • LREC 2018 • Ossama Obeid, Salam Khalifa, Nizar Habash, Houda Bouamor, Wajdi Zaghouani, Kemal Oflazer
In this paper, we introduce MADARi, a joint morphological annotation and spelling correction system for texts in Standard and Dialectal Arabic.
no code implementations • COLING 2018 • Mohammad Salameh, Houda Bouamor, Nizar Habash
Previous work on the problem of Arabic Dialect Identification typically targeted coarse-grained five dialect classes plus Standard Arabic (6-way classification).
no code implementations • LREC 2018 • Nizar Habash, Fadhl Eryani, Salam Khalifa, Owen Rambow, Dana Abdulrahim, Alex Erdmann, er, Reem Faraj, Wajdi Zaghouani, Houda Bouamor, Nasser Zalmout, Sara Hassan, Faisal Al-Shargi, Sakhar Alkhereyf, Basma Abdulkareem, Esk, Ramy er, Mohammad Salameh, Hind Saddiki
no code implementations • MTSummit 2017 • Alexander Erdmann, Nizar Habash, Dima Taji, Houda Bouamor
We present the second ever evaluated Arabic dialect-to-dialect machine translation effort, and the first to leverage external resources beyond a small parallel corpus.
no code implementations • WS 2016 • Wajdi Zaghouani, Abdelati Hawwari, Sawsan Alqahtani, Houda Bouamor, Mahmoud Ghoneim, Mona Diab, Kemal Oflazer
Arabic writing is typically underspecified for short vowels and other markups, referred to as diacritics.
no code implementations • COLING 2016 • Francisco Guzm{\'a}n, Houda Bouamor, Ramy Baly, Nizar Habash
Evaluation of machine translation (MT) into morphologically rich languages (MRL) has not been well studied despite posing many challenges.
no code implementations • LREC 2016 • Wajdi Zaghouani, Nizar Habash, Ossama Obeid, Behrang Mohit, Houda Bouamor, Kemal Oflazer
We present our guidelines and annotation procedure to create a human corrected machine translated post-edited corpus for the Modern Standard Arabic.
no code implementations • LREC 2016 • Wajdi Zaghouani, Houda Bouamor, Abdelati Hawwari, Mona Diab, Ossama Obeid, Mahmoud Ghoneim, Sawsan Alqahtani, Kemal Oflazer
This paper presents the annotation guidelines developed as part of an effort to create a large scale manually diacritized corpus for various Arabic text genres.
no code implementations • LREC 2016 • Salam Khalifa, Houda Bouamor, Nizar Habash
Dialectal Arabic (DA) poses serious challenges for Natural Language Processing (NLP).
no code implementations • LREC 2014 • Houda Bouamor, Nizar Habash, Kemal Oflazer
The daily spoken variety of Arabic is often termed the colloquial or dialect form of Arabic.
no code implementations • LREC 2014 • Ahmed Salama, Houda Bouamor, Behrang Mohit, Kemal Oflazer
This paper presents YOUDACC, an automatically annotated large-scale multi-dialectal Arabic corpus collected from user comments on Youtube videos.
no code implementations • LREC 2012 • Houda Bouamor, Aur{\'e}lien Max, Gabriel Illouz, Anne Vilnat
This paper addresses the issue of what approach should be used for building a corpus of sententential paraphrases depending on one's requirements.