1 code implementation • LREC 2020 • Ossama Obeid, Nasser Zalmout, Salam Khalifa, Dima Taji, Mai Oudah, Bashar Alhafni, Go Inoue, Fadhl Eryani, Alex Erdmann, er, Nizar Habash
We present CAMeL Tools, a collection of open-source tools for Arabic natural language processing in Python.
2 code implementations • 24 May 2023 • Yuxia Wang, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Chenxi Whitehouse, Osama Mohammed Afzal, Tarek Mahmoud, Toru Sasaki, Thomas Arnold, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov
These results show that the problem is far from solved and that there is a lot of room for improvement.
1 code implementation • EACL (WANLP) 2021 • Go Inoue, Bashar Alhafni, Nurpeiis Baimukan, Houda Bouamor, Nizar Habash
In this paper, we explore the effects of language variants, data sizes, and fine-tuning task types in Arabic pre-trained language models.
1 code implementation • 20 Feb 2024 • Fajri Koto, Haonan Li, Sara Shatnawi, Jad Doughman, Abdelrahman Boda Sadallah, Aisha Alraeesi, Khalid Almubarak, Zaid Alyafeai, Neha Sengupta, Shady Shehata, Nizar Habash, Preslav Nakov, Timothy Baldwin
The focus of language model evaluation has transitioned towards reasoning and knowledge-intensive tasks, driven by advancements in pretraining large models.
2 code implementations • CoNLL (EMNLP) 2021 • Riadh Belkebir, Nizar Habash
We present ARETA, an automatic error type annotation system for Modern Standard Arabic.
1 code implementation • UDW (COLING) 2020 • Dima Taji, Nizar Habash
We present PALMYRA 2. 0, a graphical dependency-tree visualization and editing software.
1 code implementation • 24 May 2023 • Bashar Alhafni, Go Inoue, Christian Khairallah, Nizar Habash
We also define the task of multi-class Arabic grammatical error detection (GED) and present the first results on multi-class Arabic GED.
1 code implementation • EACL (WANLP) 2021 • Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Houda Bouamor, Nizar Habash
This Shared Task includes four subtasks: country-level Modern Standard Arabic (MSA) identification (Subtask 1. 1), country-level dialect identification (Subtask 1. 2), province-level MSA identification (Subtask 2. 1), and province-level sub-dialect identification (Subtask 2. 2).
1 code implementation • GeBNLP (COLING) 2020 • Bashar Alhafni, Nizar Habash, Houda Bouamor
In this paper, we present an approach for sentence-level gender reinflection using linguistically enhanced sequence-to-sequence models.
1 code implementation • NAACL (SIGMORPHON) 2022 • Nizar Habash, Reham Marzouk, Christian Khairallah, Salam Khalifa
Arabic is a morphologically rich and complex language, with numerous dialectal variants.
1 code implementation • 18 Oct 2022 • Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Houda Bouamor, Nizar Habash
We describe findings of the third Nuanced Arabic Dialect Identification Shared Task (NADI 2022).
1 code implementation • Findings (ACL) 2022 • Go Inoue, Salam Khalifa, Nizar Habash
We present state-of-the-art results on morphosyntactic tagging across different varieties of Arabic using fine-tuned pre-trained transformer language models.
1 code implementation • NAACL 2022 • Bashar Alhafni, Nizar Habash, Houda Bouamor
In this paper, we define the task of gender rewriting in contexts involving two users (I and/or You) - first and second grammatical persons with independent grammatical gender preferences.
1 code implementation • ACL 2020 • Alexander Erdmann, Micha Elsner, Shijie Wu, Ryan Cotterell, Nizar Habash
Our benchmark system first makes use of word embeddings and string similarity to cluster forms by cell and by paradigm.
1 code implementation • 30 Jan 2024 • Kurt Micallef, Nizar Habash, Claudia Borg, Fadhl Eryani, Houda Bouamor
Although multilingual language models exhibit impressive cross-lingual transfer capabilities on unseen languages, the performance on downstream tasks is impacted when there is a script disparity with the languages used in the multilingual model's pre-training data.
no code implementations • MTSummit 2017 • Alexander Erdmann, Nizar Habash, Dima Taji, Houda Bouamor
We present the second ever evaluated Arabic dialect-to-dialect machine translation effort, and the first to leverage external resources beyond a small parallel corpus.
no code implementations • 12 Sep 2016 • Ahmed El Kholy, Nizar Habash
One common solution is to pivot through a third language for which there exist parallel corpora with the source and target languages.
no code implementations • LREC 2016 • Salam Khalifa, Nizar Habash, Dana Abdulrahim, Sara Hassan
Most Arabic natural language processing tools and resources are developed to serve Modern Standard Arabic (MSA), which is the official written language in the Arab World.
no code implementations • 18 Jun 2016 • Hassan Sajjad, Nadir Durrani, Francisco Guzman, Preslav Nakov, Ahmed Abdelali, Stephan Vogel, Wael Salloum, Ahmed El Kholy, Nizar Habash
The competition focused on informal dialectal Arabic, as used in SMS, chat, and speech.
no code implementations • 8 Jun 2016 • Amjad Almahairi, Kyunghyun Cho, Nizar Habash, Aaron Courville
Neural machine translation has become a major alternative to widely used phrase-based statistical machine translation.
no code implementations • 22 Sep 2013 • Mona Diab, Nizar Habash, Owen Rambow, Ryan Roth
The Linguistic Data Consortium (LDC) has developed hundreds of data corpora for natural language processing (NLP) research.
no code implementations • LREC 2018 • Ossama Obeid, Salam Khalifa, Nizar Habash, Houda Bouamor, Wajdi Zaghouani, Kemal Oflazer
In this paper, we introduce MADARi, a joint morphological annotation and spelling correction system for texts in Standard and Dialectal Arabic.
no code implementations • EMNLP 2018 • Daniel Watson, Nasser Zalmout, Nizar Habash
We show that providing the model with word-level features bridges the gap for the neural network approach to achieve a state-of-the-art F1 score on a standard Arabic language correction shared task dataset.
no code implementations • ACL 2018 • Alex Erdmann, er, Nasser Zalmout, Nizar Habash
Arabic dialects lack large corpora and are noisy, being linguistically disparate with no standardized spelling.
no code implementations • EACL 2017 • Nizar Habash, Nasser Zalmout, Dima Taji, Hieu Hoang, Maverick Alzate
We present Arab-Acquis, a large publicly available dataset for evaluating machine translation between 22 European languages and Arabic.
no code implementations • NAACL 2018 • Nasser Zalmout, Alex Erdmann, er, Nizar Habash
User-generated text tends to be noisy with many lexical and orthographic inconsistencies, making natural language processing (NLP) tasks more challenging.
no code implementations • SEMEVAL 2017 • Ramy Baly, Gilbert Badaro, Ali Hamdi, Rawan Moukalled, Rita Aoun, Georges El-Khoury, Ahmad Al Sallab, Hazem Hajj, Nizar Habash, Khaled Shaban, Wassim El-Hajj
While sentiment analysis in English has achieved significant progress, it remains a challenging task in Arabic given the rich morphology of the language.
no code implementations • SEMEVAL 2017 • Chukwuyem Onyibe, Nizar Habash
We describe a supervised system that uses optimized Condition Random Fields and lexical features to predict the sentiment of a tweet.
no code implementations • EMNLP 2017 • Nasser Zalmout, Nizar Habash
We make use of the resulting morphological models for scoring and ranking the analyses of the morphological analyzer for morphological disambiguation.
no code implementations • CONLL 2017 • Daniel Zeman, Martin Popel, Milan Straka, Jan Haji{\v{c}}, Joakim Nivre, Filip Ginter, Juhani Luotolahti, Sampo Pyysalo, Slav Petrov, Martin Potthast, Francis Tyers, Elena Badmaeva, Memduh Gokirmak, Anna Nedoluzhko, Silvie Cinkov{\'a}, Jan Haji{\v{c}} jr., Jaroslava Hlav{\'a}{\v{c}}ov{\'a}, V{\'a}clava Kettnerov{\'a}, Zde{\v{n}}ka Ure{\v{s}}ov{\'a}, Jenna Kanerva, Stina Ojala, Anna Missil{\"a}, Christopher D. Manning, Sebastian Schuster, Siva Reddy, Dima Taji, Nizar Habash, Herman Leung, Marie-Catherine de Marneffe, Manuela Sanguinetti, Maria Simi, Hiroshi Kanayama, Valeria de Paiva, Kira Droganova, H{\'e}ctor Mart{\'\i}nez Alonso, {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin, Umut Sulubacak, Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Georg Rehm, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, M, Michael l, Jesse Kirchner, Hector Fern Alcalde, ez, Jana Strnadov{\'a}, Esha Banerjee, Ruli Manurung, Antonio Stella, Atsuko Shimada, Sookyoung Kwak, Gustavo Mendon{\c{c}}a, L, Tatiana o, Rattima Nitisaroj, Josie Li
The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets.
no code implementations • WS 2018 • Hind Saddiki, Nizar Habash, Violetta Cavalli-Sforza, Muhamed Al Khalil
Advances in automatic readability assessment can impact the way people consume information in a number of domains.
no code implementations • COLING 2018 • Halim-Antoine Boukaram, Nizar Habash, Micheline Ziadee, Majd Sakr
Automatic syntactic parsing for question constructions is a challenging task due to the paucity of training examples in most treebanks.
no code implementations • WS 2018 • Dana Abu Ali, Muaz Ahmad, Hayat Al Hassan, Paula Dozsa, Ming Hu, Jose Varias, Nizar Habash
This demonstration paper presents a bilingual (Arabic-English) interactive human avatar dialogue system.
no code implementations • WS 2018 • Alex Erdmann, er, Nizar Habash
Morphologically rich languages are challenging for natural language processing tasks due to data sparsity.
no code implementations • WS 2018 • Dima Taji, Salam Khalifa, Ossama Obeid, Fadhl Eryani, Nizar Habash
We introduce CALIMA-Star, a very rich Arabic morphological analyzer and generator that provides functional and form-based morphological features as well as built-in tokenization, phonological representation, lexical rationality and much more.
no code implementations • WS 2017 • Salam Khalifa, Sara Hassan, Nizar Habash
We present CALIMAGLF, a Gulf Arabic morphological analyzer currently covering over 2, 600 verbal lemmas.
no code implementations • WS 2017 • Ramy Baly, Gilbert Badaro, Georges El-Khoury, Rawan Moukalled, Rita Aoun, Hazem Hajj, Wassim El-Hajj, Nizar Habash, Khaled Shaban
Opinion mining in Arabic is a challenging task given the rich morphology of the language.
no code implementations • WS 2017 • Lingliang Zhang, Nizar Habash, Godfried Toussaint
We present the MultiScript Phonetic Search algorithm to address the problem of language learners looking up unfamiliar words that they heard.
no code implementations • WS 2017 • Dima Taji, Nizar Habash, Daniel Zeman
We describe the process of creating NUDAR, a Universal Dependency treebank for Arabic.
no code implementations • WS 2016 • Nasser Zalmout, Hind Saddiki, Nizar Habash
Much research in education has been done on the study of different language teaching methods.
no code implementations • COLING 2018 • Mohammad Salameh, Houda Bouamor, Nizar Habash
Previous work on the problem of Arabic Dialect Identification typically targeted coarse-grained five dialect classes plus Standard Arabic (6-way classification).
no code implementations • COLING 2018 • Shehroze Khan, Jihyun Kim, Tarik Zulfikarpasic, Peter Chen, Nizar Habash
We present Qutr (Query Translator), a smart cross-lingual communication application for the travel domain.
no code implementations • COLING 2016 • Francisco Guzm{\'a}n, Houda Bouamor, Ramy Baly, Nizar Habash
Evaluation of machine translation (MT) into morphologically rich languages (MRL) has not been well studied despite posing many challenges.
no code implementations • COLING 2016 • Esk, Ramy er, Nizar Habash, Owen Rambow, Arfath Pasha
Arabic dialects present a special problem for natural language processing because there are few resources, they have no standard orthography, and have not been studied much.
no code implementations • COLING 2016 • Dana Abu Ali, Nizar Habash
This paper presents BOTTA, the first Arabic dialect chatbot.
no code implementations • COLING 2016 • Salam Khalifa, Nasser Zalmout, Nizar Habash
In this paper, we present YAMAMA, a multi-dialect Arabic morphological analyzer and disambiguator.
no code implementations • COLING 2016 • Anas Shahrour, Salam Khalifa, Dima Taji, Nizar Habash
In this paper, we present CamelParser, a state-of-the-art system for Arabic syntactic dependency analysis aligned with contextually disambiguated morphological features.
no code implementations • LREC 2018 • Nizar Habash, Fadhl Eryani, Salam Khalifa, Owen Rambow, Dana Abdulrahim, Alex Erdmann, er, Reem Faraj, Wajdi Zaghouani, Houda Bouamor, Nasser Zalmout, Sara Hassan, Faisal Al-Shargi, Sakhar Alkhereyf, Basma Abdulkareem, Esk, Ramy er, Mohammad Salameh, Hind Saddiki
no code implementations • 29 Jan 2019 • Dima Taji, Jamila El Gizuli, Nizar Habash
In this paper we present a dependency treebank of travel domain sentences in Modern Standard Arabic.
no code implementations • WS 2014 • Ann Bies, Zhiyi Song, Mohamed Maamouri, Stephen Grimes, Haejoong Lee, Jonathan Wright, Stephanie Strassel, Nizar Habash, Esk, Ramy er, Owen Rambow
no code implementations • WS 2013 • Djam{\'e} Seddah, Reut Tsarfaty, S K{\"u}bler, ra, C, Marie ito, Jinho D. Choi, Rich{\'a}rd Farkas, Jennifer Foster, Iakes Goenaga, Koldo Gojenola Galletebeitia, Yoav Goldberg, Spence Green, Nizar Habash, Marco Kuhlmann, Wolfgang Maier, Joakim Nivre, Adam Przepi{\'o}rkowski, Ryan Roth, Wolfgang Seeker, Yannick Versley, Veronika Vincze, Marcin Woli{\'n}ski, Alina Wr{\'o}blewska, Eric Villemonte de la Clergerie
no code implementations • LREC 2014 • Mohamed Maamouri, Ann Bies, Seth Kulick, Michael Ciul, Nizar Habash, Esk, Ramy er
This paper describes the parallel development of an Egyptian Arabic Treebank and a morphological analyzer for Egyptian Arabic (CALIMA).
no code implementations • LREC 2014 • Mona Diab, Mohamed Al-Badrashiny, Maryam Aminian, Mohammed Attia, Heba Elfardy, Nizar Habash, Abdelati Hawwari, Wael Salloum, Pradeep Dasigi, Esk, Ramy er
Multiple levels of quality checks are performed on the output of each step in the creation process.
no code implementations • LREC 2014 • In{\`e}s Zribi, Rahma Boujelbane, Abir Masmoudi, Mariem Ellouze, Lamia Belguith, Nizar Habash
Tunisian Arabic is a dialect of the Arabic language spoken in Tunisia.
no code implementations • LREC 2014 • Abir Masmoudi, Mariem Ellouze Khmekhem, Yannick Est{\`e}ve, lamia hadrich belguith, Nizar Habash
In this paper we describe an effort to create a corpus and phonetic dictionary for Tunisian Arabic Automatic Speech Recognition (ASR).
no code implementations • LREC 2014 • Houda Bouamor, Nizar Habash, Kemal Oflazer
The daily spoken variety of Arabic is often termed the colloquial or dialect form of Arabic.
no code implementations • LREC 2014 • Arfath Pasha, Mohamed Al-Badrashiny, Mona Diab, Ahmed El Kholy, Esk, Ramy er, Nizar Habash, Manoj Pooleery, Owen Rambow, Ryan Roth
In this paper, we present MADAMIRA, a system for morphological analysis and disambiguation of Arabic that combines some of the best aspects of two previously commonly used systems for Arabic processing, MADA (Habash and Rambow, 2005; Habash et al., 2009; Habash et al., 2013) and AMIRA (Diab et al., 2007).
no code implementations • LREC 2014 • Wajdi Zaghouani, Behrang Mohit, Nizar Habash, Ossama Obeid, Nadi Tomeh, Alla Rozovskaya, Noura Farra, Sarah Alkuhlani, Kemal Oflazer
Finally, we present the annotation tool that was developed as part of this project, the annotation pipeline, and the quality of the resulting annotations.
no code implementations • LREC 2012 • Nizar Habash, Mona Diab, Owen Rambow
Dialectal Arabic (DA) refers to the day-to-day vernaculars spoken in the Arab world.
no code implementations • JEPTALNRECITAL 2013 • Ahmed Hamdi, Rahma Boujelbane, Nizar Habash, Alexis Nasr
no code implementations • NAACL 2019 • Ossama Obeid, Mohammad Salameh, Houda Bouamor, Nizar Habash
This demo paper describes ADIDA, a web-based system for automatic dialect identification for Arabic text.
no code implementations • LREC 2016 • Salam Khalifa, Houda Bouamor, Nizar Habash
Dialectal Arabic (DA) poses serious challenges for Natural Language Processing (NLP).
no code implementations • LREC 2016 • Faisal Al-Shargi, Aidan Kaplan, Esk, Ramy er, Nizar Habash, Owen Rambow
We present new language resources for Moroccan and Sanaani Yemeni Arabic.
no code implementations • LREC 2016 • Wajdi Zaghouani, Nizar Habash, Ossama Obeid, Behrang Mohit, Houda Bouamor, Kemal Oflazer
We present our guidelines and annotation procedure to create a human corrected machine translated post-edited corpus for the Modern Standard Arabic.
no code implementations • LREC 2016 • Irina Temnikova, Wajdi Zaghouani, Stephan Vogel, Nizar Habash
The goal of the cognitive machine translation (MT) evaluation approach is to build classifiers which assign post-editing effort scores to new texts.
no code implementations • LREC 2016 • Mohamed Al-Badrashiny, Arfath Pasha, Mona Diab, Nizar Habash, Owen Rambow, Wael Salloum, Esk, Ramy er
Text preprocessing is an important and necessary task for all NLP applications.
no code implementations • LREC 2016 • Nizar Habash, Anas Shahrour, Muhamed Al-Khalil
We present a novel technique for Arabic morphological annotation.
no code implementations • LREC 2016 • Ayman Al Zaatari, Rim El Ballouli, Shady ELbassouni, Wassim El-Hajj, Hazem Hajj, Khaled Shaban, Nizar Habash, Emad Yahya
We focus on Arabic due to the recent popularity of blogs and microblogs in the Arab World and due to the lack of any such public corpora in Arabic.
no code implementations • WS 2019 • Mai Oudah, Amjad Almahairi, Nizar Habash
Neural networks have become the state-of-the-art approach for machine translation (MT) in many languages.
no code implementations • 14 Jul 2019 • Ella Noll, Mai Oudah, Nizar Habash
A common bottleneck for developing machine translation (MT) systems for some language pairs is the lack of direct parallel translation data sets, in general and in certain domains.
no code implementations • ACL 2019 • William Held, Nizar Habash
Hypernymy modeling has largely been separated according to two paradigms, pattern-based methods and distributional methods.
no code implementations • WS 2019 • Nizar Habash, Houda Bouamor, Christine Chung
The impressive progress in many Natural Language Processing (NLP) applications has increased the awareness of some of the biases these NLP systems have with regards to gender identities.
no code implementations • WS 2019 • Alex Erdmann, er, Salam Khalifa, Mai Oudah, Nizar Habash, Houda Bouamor
We present de-lexical segmentation, a linguistically motivated alternative to greedy or other unsupervised methods, requiring only minimal language specific input.
no code implementations • WS 2019 • Faisal Alshargi, Shahd Dibas, Sakhar Alkhereyf, Reem Faraj, Basmah Abdulkareem, Sane Yagi, Ouafaa Kacha, Nizar Habash, Owen Rambow
These corpora will be publicly available to serve as benchmarks for training and evaluating systems for Arabic dialect morphological analysis and disambiguation.
no code implementations • WS 2019 • Houda Bouamor, Sabit Hassan, Nizar Habash
In this paper, we present the results and findings of the MADAR Shared Task on Arabic Fine-Grained Dialect Identification.
no code implementations • ACL 2020 • Nasser Zalmout, Nizar Habash
Semitic languages can be highly ambiguous, having several interpretations of the same surface forms, and morphologically rich, having many morphemes that realize several morphological features.
no code implementations • ACL 2019 • Nasser Zalmout, Nizar Habash
In this paper we explore the use of multitask learning and adversarial training to address morphological richness and dialectal variations in the context of full morphological tagging.
no code implementations • LREC 2020 • Alberto Chierici, Nizar Habash, Margarita Bicec
The first challenges are to define a sensible methodology for data collection and to create useful data sets for training the system to retrieve the best answer to a user{'}s question.
no code implementations • LREC 2020 • Muhamed Al Khalil, Nizar Habash, Zhengyang Jiang
We present a large-scale 26, 000-lemma leveled readability lexicon for Modern Standard Arabic.
no code implementations • LREC 2020 • Salam Khalifa, Nasser Zalmout, Nizar Habash
In this paper we present the first full morphological analysis and disambiguation system for Gulf Arabic.
no code implementations • LREC 2020 • Fadhl Eryani, Nizar Habash, Houda Bouamor, Salam Khalifa
In this paper, we present the MADAR CODA Corpus, a collection of 10, 000 sentences from five Arabic city dialects (Beirut, Cairo, Doha, Rabat, and Tunis) represented in the Conventional Orthography for Dialectal Arabic (CODA) in parallel with their raw original form.
no code implementations • COLING (WANLP) 2020 • Muhammad Abdul-Mageed, Chiyu Zhang, Houda Bouamor, Nizar Habash
The data for the shared task covers a total of 100 provinces from 21 Arab countries and are collected from the Twitter domain.
no code implementations • 25 Nov 2020 • Kareem Darwish, Nizar Habash, Mourad Abbas, Hend Al-Khalifa, Huseein T. Al-Natsheh, Samhaa R. El-Beltagy, Houda Bouamor, Karim Bouzoubaa, Violetta Cavalli-Sforza, Wassim El-Hajj, Mustafa Jarrar, Hamdy Mubarak
The term natural language refers to any system of symbolic communication (spoken, signed or written) without intentional human planning and design.
no code implementations • COLING 2020 • Yash Kankanampati, Joseph Le Roux, Nadi Tomeh, Dima Taji, Nizar Habash
In this paper we present a parsing model for projective dependency trees which takes advantage of the existence of complementary dependency annotations which is the case in Arabic, with the availability of CATiB and UD treebanks.
no code implementations • COLING 2020 • Nasser Zalmout, Nizar Habash
In addition to generic n-gram embeddings (using FastText), we experiment with concatenative (stems) and templatic (roots and patterns) morphological subwords.
no code implementations • COLING 2020 • Zhengyang Jiang, Nizar Habash, Muhamed Al Khalil
This demo paper introduces the online Readability Leveled Arabic Thesaurus interface.
no code implementations • LREC 2022 • Bashar Alhafni, Nizar Habash, Houda Bouamor
Much of the research on this issue has focused on mitigating gender bias in English NLP models and systems.
no code implementations • SIGDIAL (ACL) 2021 • Alberto Chierici, Tyeece Kiana Fredorcia Hensley, Wahib Kamran, Kertu Koss, Armaan Agrawal, Erin Meekhof, Goffredo Puccetti, Nizar Habash
Time-offset interaction applications (TOIA) allow simulating conversations with people who have previously recorded relevant video utterances, which are played in response to their interacting user.
no code implementations • ACL (SIGMORPHON) 2021 • Tiago Pimentel, Maria Ryskina, Sabrina J. Mielke, Shijie Wu, Eleanor Chodroff, Brian Leonard, Garrett Nicolai, Yustinus Ghanggo Ate, Salam Khalifa, Nizar Habash, Charbel El-Khaissi, Omer Goldman, Michael Gasser, William Lane, Matt Coler, Arturo Oncevay, Jaime Rafael Montoya Samame, Gema Celeste Silva Villegas, Adam Ek, Jean-Philippe Bernardy, Andrey Shcherbakov, Aziyana Bayyr-ool, Karina Sheifer, Sofya Ganieva, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Andrew Krizhanovsky, Natalia Krizhanovsky, Clara Vania, Sardana Ivanova, Aelita Salchak, Christopher Straughn, Zoey Liu, Jonathan North Washington, Duygu Ataman, Witold Kieraś, Marcin Woliński, Totok Suhardijanto, Niklas Stoehr, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Richard J. Hatcher, Emily Prud'hommeaux, Ritesh Kumar, Mans Hulden, Botond Barta, Dorina Lakatos, Gábor Szolnok, Judit Ács, Mohit Raj, David Yarowsky, Ryan Cotterell, Ben Ambridge, Ekaterina Vylomova
This year's iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typological diversity and cross-lingual variation of morphosyntactic features.
no code implementations • COLING (WANLP) 2020 • Ali Shazal, Aiza Usman, Nizar Habash
While online Arabic is primarily written using the Arabic script, a Roman-script variety called Arabizi is often seen on social media.
no code implementations • EACL (HumEval) 2021 • Alberto Chierici, Nizar Habash
Our contributions include the annotated dataset that we make publicly available and the proposal of Success Rate @k as an evaluation metric that is more appropriate than the traditional QA’s and information retrieval’s metrics.
no code implementations • 21 Mar 2022 • Moussa Kamal Eddine, Nadi Tomeh, Nizar Habash, Joseph Le Roux, Michalis Vazirgiannis
Like most natural language understanding and generation tasks, state-of-the-art models for summarization are transformer-based sequence-to-sequence architectures that are pretrained on large corpora.
Abstractive Text Summarization Natural Language Understanding
no code implementations • LREC 2022 • Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova
The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.
no code implementations • 25 May 2022 • Injy Hamed, Nizar Habash, Slim Abdennadher, Ngoc Thang Vu
Results show that using a predictive model results in more natural CS sentences compared to the random approach, as reported in human judgements.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • LREC 2022 • Nizar Habash, David Palfreyman
We present ZAEBUC, an annotated Arabic-English bilingual writer corpus comprising short essays by first-year university students at Zayed University in the United Arab Emirates.
no code implementations • LREC 2022 • Dana Abdulrahim, Go Inoue, Latifa Shamsan, Salam Khalifa, Nizar Habash
Our objective is to create a specialized corpus of the Bahraini Arabic dialect, which includes written texts as well as transcripts of audio files, belonging to a different genre (folktales, comedy shows, plays, cooking shows, etc.).
no code implementations • LREC 2022 • Nizar Habash, Muhammed AbuOdeh, Dima Taji, Reem Faraj, Jamila El Gizuli, Omar Kallas
We present the Camel Treebank (CAMELTB), a 188K word open-source dependency treebank of Modern Standard and Classical Arabic.
no code implementations • LREC 2022 • Nurpeiis Baimukan, Houda Bouamor, Nizar Habash
We test the value of such aggregation by building language models and using them in dialect identification.
no code implementations • OSACT (LREC) 2022 • Mahmoud El-Haj, Elvis de Souza, Nouran Khallaf, Paul Rayson, Nizar Habash
This paper presents (AraSAS) the first open-source Arabic semantic analysis tagging system.
no code implementations • 11 Oct 2022 • Marwa Gaser, Manuel Mager, Injy Hamed, Nizar Habash, Slim Abdennadher, Ngoc Thang Vu
For extreme low-resource scenarios, a combination of frequency and morphology-based segmentations is shown to perform the best.
no code implementations • 14 Oct 2022 • Bashar Alhafni, Ossama Obeid, Nizar Habash
We introduce the User-Aware Arabic Gender Rewriter, a user-centric web-based system for Arabic gender rewriting in contexts involving two users.
no code implementations • 19 Oct 2022 • Reem Hazim, Hind Saddiki, Bashar Alhafni, Muhamed Al Khalil, Nizar Habash
This demo paper presents a Google Docs add-on for automatic Arabic word-level readability visualization.
no code implementations • 24 Oct 2022 • Shahd Dibas, Christian Khairallah, Nizar Habash, Omar Fayez Sadi, Tariq Sairafy, Karmel Sarabta, Abrar Ardah
We present Maknuune, a large open lexicon for the Palestinian Arabic dialect.
no code implementations • 22 Oct 2022 • Bashar Alhafni, Nizar Habash, Houda Bouamor, Ossama Obeid, Sultan Alrowili, Daliyah AlZeer, Khawlah M. Alshanqiti, Ahmed ElBakry, Muhammad ElNokrashy, Mohamed Gabr, Abderrahmane Issam, Abdelrahim Qaddoumi, K. Vijay-Shanker, Mahmoud Zyate
In this paper, we present the results and findings of the Shared Task on Gender Rewriting, which was organized as part of the Seventh Arabic Natural Language Processing Workshop.
no code implementations • 22 Nov 2022 • Injy Hamed, Nizar Habash, Slim Abdennadher, Ngoc Thang Vu
We present our work on collecting ArzEn-ST, a code-switched Egyptian Arabic - English Speech Translation Corpus.
no code implementations • 22 Nov 2022 • Injy Hamed, Amir Hussein, Oumnia Chellah, Shammur Chowdhury, Hamdy Mubarak, Sunayana Sitaram, Nizar Habash, Ahmed Ali
Code-switching poses a number of challenges and opportunities for multilingual automatic speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 30 Nov 2022 • Ossama Obeid, Go Inoue, Nizar Habash
We present Camelira, a web-based Arabic multi-dialect morphological disambiguation tool that covers four major variants of Arabic: Modern Standard Arabic, Egyptian, Gulf, and Levantine.
no code implementations • 7 May 2023 • Hazem Ibrahim, Fengyuan Liu, Rohail Asim, Balaraju Battu, Sidahmed Benabderrahmane, Bashar Alhafni, Wifag Adnan, Tuka Alhanai, Bedoor AlShebli, Riyadh Baghdadi, Jocelyn J. Bélanger, Elena Beretta, Kemal Celik, Moumena Chaqfeh, Mohammed F. Daqaq, Zaynab El Bernoussi, Daryl Fougnie, Borja Garcia de Soto, Alberto Gandolfi, Andras Gyorgy, Nizar Habash, J. Andrew Harris, Aaron Kaufman, Lefteris Kirousis, Korhan Kocak, Kangsan Lee, Seungah S. Lee, Samreen Malik, Michail Maniatakos, David Melcher, Azzam Mourad, Minsu Park, Mahmoud Rasras, Alicja Reuben, Dania Zantout, Nancy W. Gleason, Kinga Makovi, Talal Rahwan, Yasir Zaki
Moreover, current AI-text classifiers cannot reliably detect ChatGPT's use in school work, due to their propensity to classify human-written answers as AI-generated, as well as the ease with which AI-generated text can be edited to evade detection.
no code implementations • 23 Oct 2023 • Injy Hamed, Nizar Habash, Ngoc Thang Vu
Linguistic theories and random lexical replacement prove to be effective in the lack of CSW parallel data, where both approaches achieve similar results.
no code implementations • 24 Oct 2023 • Muhammad Abdul-Mageed, AbdelRahim Elmadany, Chiyu Zhang, El Moatez Billah Nagoudi, Houda Bouamor, Nizar Habash
We describe the findings of the fourth Nuanced Arabic Dialect Identification Shared Task (NADI 2023).
no code implementations • 1 Feb 2024 • Christian Khairallah, Reham Marzouk, Salam Khalifa, Mayar Nassar, Nizar Habash
Modern Standard Arabic (MSA) nominals present many morphological and lexical modeling challenges that have not been consistently addressed previously.
no code implementations • 17 Feb 2024 • Yuxia Wang, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Osama Mohanned Afzal, Tarek Mahmoud, Giovanni Puccetti, Thomas Arnold, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov
The advent of Large Language Models (LLMs) has brought an unprecedented surge in machine-generated text (MGT) across diverse channels.
no code implementations • 27 Mar 2024 • Injy Hamed, Fadhl Eryani, David Palfreyman, Nizar Habash
We present ZAEBUC-Spoken, a multilingual multidialectal Arabic-English speech corpus.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 22 Apr 2024 • Yuxia Wang, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Osama Mohammed Afzal, Tarek Mahmoud, Giovanni Puccetti, Thomas Arnold, Chenxi Whitehouse, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov
The task attracted a large number of participants: subtask A monolingual (126), subtask A multilingual (59), subtask B (70), and subtask C (30).