1 code implementation • GeBNLP (COLING) 2020 • Bashar Alhafni, Nizar Habash, Houda Bouamor
In this paper, we present an approach for sentence-level gender reinflection using linguistically enhanced sequence-to-sequence models.
no code implementations • OSACT (LREC) 2022 • Mahmoud El-Haj, Elvis de Souza, Nouran Khallaf, Paul Rayson, Nizar Habash
This paper presents (AraSAS) the first open-source Arabic semantic analysis tagging system.
no code implementations • LREC 2022 • Nurpeiis Baimukan, Houda Bouamor, Nizar Habash
We test the value of such aggregation by building language models and using them in dialect identification.
no code implementations • LREC 2022 • Nizar Habash, Muhammed AbuOdeh, Dima Taji, Reem Faraj, Jamila El Gizuli, Omar Kallas
We present the Camel Treebank (CAMELTB), a 188K word open-source dependency treebank of Modern Standard and Classical Arabic.
no code implementations • LREC 2022 • Dana Abdulrahim, Go Inoue, Latifa Shamsan, Salam Khalifa, Nizar Habash
Our objective is to create a specialized corpus of the Bahraini Arabic dialect, which includes written texts as well as transcripts of audio files, belonging to a different genre (folktales, comedy shows, plays, cooking shows, etc.).
no code implementations • LREC 2022 • Nizar Habash, David Palfreyman
We present ZAEBUC, an annotated Arabic-English bilingual writer corpus comprising short essays by first-year university students at Zayed University in the United Arab Emirates.
1 code implementation • NAACL (SIGMORPHON) 2022 • Nizar Habash, Reham Marzouk, Christian Khairallah, Salam Khalifa
Arabic is a morphologically rich and complex language, with numerous dialectal variants.
no code implementations • EACL (HumEval) 2021 • Alberto Chierici, Nizar Habash
Our contributions include the annotated dataset that we make publicly available and the proposal of Success Rate @k as an evaluation metric that is more appropriate than the traditional QA’s and information retrieval’s metrics.
no code implementations • COLING (WANLP) 2020 • Ali Shazal, Aiza Usman, Nizar Habash
While online Arabic is primarily written using the Arabic script, a Roman-script variety called Arabizi is often seen on social media.
no code implementations • ACL (SIGMORPHON) 2021 • Tiago Pimentel, Maria Ryskina, Sabrina J. Mielke, Shijie Wu, Eleanor Chodroff, Brian Leonard, Garrett Nicolai, Yustinus Ghanggo Ate, Salam Khalifa, Nizar Habash, Charbel El-Khaissi, Omer Goldman, Michael Gasser, William Lane, Matt Coler, Arturo Oncevay, Jaime Rafael Montoya Samame, Gema Celeste Silva Villegas, Adam Ek, Jean-Philippe Bernardy, Andrey Shcherbakov, Aziyana Bayyr-ool, Karina Sheifer, Sofya Ganieva, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Andrew Krizhanovsky, Natalia Krizhanovsky, Clara Vania, Sardana Ivanova, Aelita Salchak, Christopher Straughn, Zoey Liu, Jonathan North Washington, Duygu Ataman, Witold Kieraś, Marcin Woliński, Totok Suhardijanto, Niklas Stoehr, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Richard J. Hatcher, Emily Prud'hommeaux, Ritesh Kumar, Mans Hulden, Botond Barta, Dorina Lakatos, Gábor Szolnok, Judit Ács, Mohit Raj, David Yarowsky, Ryan Cotterell, Ben Ambridge, Ekaterina Vylomova
This year's iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typological diversity and cross-lingual variation of morphosyntactic features.
1 code implementation • UDW (COLING) 2020 • Dima Taji, Nizar Habash
We present PALMYRA 2. 0, a graphical dependency-tree visualization and editing software.
no code implementations • SIGDIAL (ACL) 2021 • Alberto Chierici, Tyeece Kiana Fredorcia Hensley, Wahib Kamran, Kertu Koss, Armaan Agrawal, Erin Meekhof, Goffredo Puccetti, Nizar Habash
Time-offset interaction applications (TOIA) allow simulating conversations with people who have previously recorded relevant video utterances, which are played in response to their interacting user.
no code implementations • 30 Nov 2022 • Ossama Obeid, Go Inoue, Nizar Habash
We present Camelira, a web-based Arabic multi-dialect morphological disambiguation tool that covers four major variants of Arabic: Modern Standard Arabic, Egyptian, Gulf, and Levantine.
no code implementations • 22 Nov 2022 • Injy Hamed, Amir Hussein, Oumnia Chellah, Shammur Chowdhury, Hamdy Mubarak, Sunayana Sitaram, Nizar Habash, Ahmed Ali
Code-switching poses a number of challenges and opportunities for multilingual automatic speech recognition.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 22 Nov 2022 • Injy Hamed, Nizar Habash, Slim Abdennadher, Ngoc Thang Vu
We present our work on collecting ArzEn-ST, a code-switched Egyptian Arabic - English Speech Translation Corpus.
no code implementations • 24 Oct 2022 • Shahd Dibas, Christian Khairallah, Nizar Habash, Omar Fayez Sadi, Tariq Sairafy, Karmel Sarabta, Abrar Ardah
We present Maknuune, a large open lexicon for the Palestinian Arabic dialect.
no code implementations • 22 Oct 2022 • Bashar Alhafni, Nizar Habash, Houda Bouamor, Ossama Obeid, Sultan Alrowili, Daliyah AlZeer, Khawlah M. Alshanqiti, Ahmed ElBakry, Muhammad ElNokrashy, Mohamed Gabr, Abderrahmane Issam, Abdelrahim Qaddoumi, K. Vijay-Shanker, Mahmoud Zyate
In this paper, we present the results and findings of the Shared Task on Gender Rewriting, which was organized as part of the Seventh Arabic Natural Language Processing Workshop.
no code implementations • 19 Oct 2022 • Reem Hazim, Hind Saddiki, Bashar Alhafni, Muhamed Al Khalil, Nizar Habash
This demo paper presents a Google Docs add-on for automatic Arabic word-level readability visualization.
1 code implementation • 18 Oct 2022 • Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Houda Bouamor, Nizar Habash
We describe findings of the third Nuanced Arabic Dialect Identification Shared Task (NADI 2022).
no code implementations • 14 Oct 2022 • Bashar Alhafni, Ossama Obeid, Nizar Habash
We introduce the User-Aware Arabic Gender Rewriter, a user-centric web-based system for Arabic gender rewriting in contexts involving two users.
no code implementations • 11 Oct 2022 • Marwa Gaser, Manuel Mager, Injy Hamed, Nizar Habash, Slim Abdennadher, Ngoc Thang Vu
For extreme low-resource scenarios, a combination of frequency and morphology-based segmentations is shown to perform the best.
no code implementations • 25 May 2022 • Injy Hamed, Nizar Habash, Slim Abdennadher, Ngoc Thang Vu
Our best models achieve 33. 6% improvement in perplexity, +3. 2-5. 6 BLEU points on MT task, and 7% relative improvement on WER for ASR task.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • LREC 2022 • Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova
The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.
1 code implementation • NAACL 2022 • Bashar Alhafni, Nizar Habash, Houda Bouamor
In this paper, we define the task of gender rewriting in contexts involving two users (I and/or You) - first and second grammatical persons with independent grammatical gender preferences.
no code implementations • 21 Mar 2022 • Moussa Kamal Eddine, Nadi Tomeh, Nizar Habash, Joseph Le Roux, Michalis Vazirgiannis
Like most natural language understanding and generation tasks, state-of-the-art models for summarization are transformer-based sequence-to-sequence architectures that are pretrained on large corpora.
Abstractive Text Summarization
Natural Language Understanding
no code implementations • LREC 2022 • Bashar Alhafni, Nizar Habash, Houda Bouamor
Much of the research on this issue has focused on mitigating gender bias in English NLP models and systems.
1 code implementation • Findings (ACL) 2022 • Go Inoue, Salam Khalifa, Nizar Habash
We present state-of-the-art results on morphosyntactic tagging across different varieties of Arabic using fine-tuned pre-trained transformer language models.
1 code implementation • CoNLL (EMNLP) 2021 • Riadh Belkebir, Nizar Habash
We present ARETA, an automatic error type annotation system for Modern Standard Arabic.
1 code implementation • EACL (WANLP) 2021 • Go Inoue, Bashar Alhafni, Nurpeiis Baimukan, Houda Bouamor, Nizar Habash
In this paper, we explore the effects of language variants, data sizes, and fine-tuning task types in Arabic pre-trained language models.
1 code implementation • EACL (WANLP) 2021 • Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Houda Bouamor, Nizar Habash
This Shared Task includes four subtasks: country-level Modern Standard Arabic (MSA) identification (Subtask 1. 1), country-level dialect identification (Subtask 1. 2), province-level MSA identification (Subtask 2. 1), and province-level sub-dialect identification (Subtask 2. 2).
no code implementations • COLING 2020 • Yash Kankanampati, Joseph Le Roux, Nadi Tomeh, Dima Taji, Nizar Habash
In this paper we present a parsing model for projective dependency trees which takes advantage of the existence of complementary dependency annotations which is the case in Arabic, with the availability of CATiB and UD treebanks.
no code implementations • COLING 2020 • Zhengyang Jiang, Nizar Habash, Muhamed Al Khalil
This demo paper introduces the online Readability Leveled Arabic Thesaurus interface.
no code implementations • COLING 2020 • Nasser Zalmout, Nizar Habash
In addition to generic n-gram embeddings (using FastText), we experiment with concatenative (stems) and templatic (roots and patterns) morphological subwords.
no code implementations • 25 Nov 2020 • Kareem Darwish, Nizar Habash, Mourad Abbas, Hend Al-Khalifa, Huseein T. Al-Natsheh, Samhaa R. El-Beltagy, Houda Bouamor, Karim Bouzoubaa, Violetta Cavalli-Sforza, Wassim El-Hajj, Mustafa Jarrar, Hamdy Mubarak
The term natural language refers to any system of symbolic communication (spoken, signed or written) without intentional human planning and design.
no code implementations • COLING (WANLP) 2020 • Muhammad Abdul-Mageed, Chiyu Zhang, Houda Bouamor, Nizar Habash
The data for the shared task covers a total of 100 provinces from 21 Arab countries and are collected from the Twitter domain.
1 code implementation • ACL 2020 • Alexander Erdmann, Micha Elsner, Shijie Wu, Ryan Cotterell, Nizar Habash
Our benchmark system first makes use of word embeddings and string similarity to cluster forms by cell and by paradigm.
1 code implementation • LREC 2020 • Ossama Obeid, Nasser Zalmout, Salam Khalifa, Dima Taji, Mai Oudah, Bashar Alhafni, Go Inoue, Fadhl Eryani, Alex Erdmann, er, Nizar Habash
We present CAMeL Tools, a collection of open-source tools for Arabic natural language processing in Python.
no code implementations • LREC 2020 • Salam Khalifa, Nasser Zalmout, Nizar Habash
In this paper we present the first full morphological analysis and disambiguation system for Gulf Arabic.
no code implementations • LREC 2020 • Alberto Chierici, Nizar Habash, Margarita Bicec
The first challenges are to define a sensible methodology for data collection and to create useful data sets for training the system to retrieve the best answer to a user{'}s question.
no code implementations • LREC 2020 • Muhamed Al Khalil, Nizar Habash, Zhengyang Jiang
We present a large-scale 26, 000-lemma leveled readability lexicon for Modern Standard Arabic.
no code implementations • LREC 2020 • Fadhl Eryani, Nizar Habash, Houda Bouamor, Salam Khalifa
In this paper, we present the MADAR CODA Corpus, a collection of 10, 000 sentences from five Arabic city dialects (Beirut, Cairo, Doha, Rabat, and Tunis) represented in the Conventional Orthography for Dialectal Arabic (CODA) in parallel with their raw original form.
no code implementations • ACL 2019 • Nasser Zalmout, Nizar Habash
In this paper we explore the use of multitask learning and adversarial training to address morphological richness and dialectal variations in the context of full morphological tagging.
no code implementations • ACL 2020 • Nasser Zalmout, Nizar Habash
Semitic languages can be highly ambiguous, having several interpretations of the same surface forms, and morphologically rich, having many morphemes that realize several morphological features.
no code implementations • WS 2019 • Faisal Alshargi, Shahd Dibas, Sakhar Alkhereyf, Reem Faraj, Basmah Abdulkareem, Sane Yagi, Ouafaa Kacha, Nizar Habash, Owen Rambow
These corpora will be publicly available to serve as benchmarks for training and evaluating systems for Arabic dialect morphological analysis and disambiguation.
no code implementations • WS 2019 • Houda Bouamor, Sabit Hassan, Nizar Habash
In this paper, we present the results and findings of the MADAR Shared Task on Arabic Fine-Grained Dialect Identification.
no code implementations • WS 2019 • Nizar Habash, Houda Bouamor, Christine Chung
The impressive progress in many Natural Language Processing (NLP) applications has increased the awareness of some of the biases these NLP systems have with regards to gender identities.
no code implementations • WS 2019 • Alex Erdmann, er, Salam Khalifa, Mai Oudah, Nizar Habash, Houda Bouamor
We present de-lexical segmentation, a linguistically motivated alternative to greedy or other unsupervised methods, requiring only minimal language specific input.
no code implementations • 14 Jul 2019 • Ella Noll, Mai Oudah, Nizar Habash
A common bottleneck for developing machine translation (MT) systems for some language pairs is the lack of direct parallel translation data sets, in general and in certain domains.
no code implementations • ACL 2019 • William Held, Nizar Habash
Hypernymy modeling has largely been separated according to two paradigms, pattern-based methods and distributional methods.
no code implementations • WS 2019 • Mai Oudah, Amjad Almahairi, Nizar Habash
Neural networks have become the state-of-the-art approach for machine translation (MT) in many languages.
no code implementations • NAACL 2019 • Ossama Obeid, Mohammad Salameh, Houda Bouamor, Nizar Habash
This demo paper describes ADIDA, a web-based system for automatic dialect identification for Arabic text.
no code implementations • 29 Jan 2019 • Dima Taji, Jamila El Gizuli, Nizar Habash
In this paper we present a dependency treebank of travel domain sentences in Modern Standard Arabic.
no code implementations • WS 2018 • Alex Erdmann, er, Nizar Habash
Morphologically rich languages are challenging for natural language processing tasks due to data sparsity.
no code implementations • WS 2018 • Dima Taji, Salam Khalifa, Ossama Obeid, Fadhl Eryani, Nizar Habash
We introduce CALIMA-Star, a very rich Arabic morphological analyzer and generator that provides functional and form-based morphological features as well as built-in tokenization, phonological representation, lexical rationality and much more.
no code implementations • EMNLP 2018 • Daniel Watson, Nasser Zalmout, Nizar Habash
We show that providing the model with word-level features bridges the gap for the neural network approach to achieve a state-of-the-art F1 score on a standard Arabic language correction shared task dataset.
no code implementations • LREC 2018 • Ossama Obeid, Salam Khalifa, Nizar Habash, Houda Bouamor, Wajdi Zaghouani, Kemal Oflazer
In this paper, we introduce MADARi, a joint morphological annotation and spelling correction system for texts in Standard and Dialectal Arabic.
no code implementations • COLING 2018 • Shehroze Khan, Jihyun Kim, Tarik Zulfikarpasic, Peter Chen, Nizar Habash
We present Qutr (Query Translator), a smart cross-lingual communication application for the travel domain.
no code implementations • COLING 2018 • Mohammad Salameh, Houda Bouamor, Nizar Habash
Previous work on the problem of Arabic Dialect Identification typically targeted coarse-grained five dialect classes plus Standard Arabic (6-way classification).
no code implementations • COLING 2018 • Halim-Antoine Boukaram, Nizar Habash, Micheline Ziadee, Majd Sakr
Automatic syntactic parsing for question constructions is a challenging task due to the paucity of training examples in most treebanks.
no code implementations • WS 2018 • Hind Saddiki, Nizar Habash, Violetta Cavalli-Sforza, Muhamed Al Khalil
Advances in automatic readability assessment can impact the way people consume information in a number of domains.
no code implementations • ACL 2018 • Alex Erdmann, er, Nasser Zalmout, Nizar Habash
Arabic dialects lack large corpora and are noisy, being linguistically disparate with no standardized spelling.
no code implementations • WS 2018 • Dana Abu Ali, Muaz Ahmad, Hayat Al Hassan, Paula Dozsa, Ming Hu, Jose Varias, Nizar Habash
This demonstration paper presents a bilingual (Arabic-English) interactive human avatar dialogue system.
no code implementations • NAACL 2018 • Nasser Zalmout, Alex Erdmann, er, Nizar Habash
User-generated text tends to be noisy with many lexical and orthographic inconsistencies, making natural language processing (NLP) tasks more challenging.
no code implementations • LREC 2018 • Nizar Habash, Fadhl Eryani, Salam Khalifa, Owen Rambow, Dana Abdulrahim, Alex Erdmann, er, Reem Faraj, Wajdi Zaghouani, Houda Bouamor, Nasser Zalmout, Sara Hassan, Faisal Al-Shargi, Sakhar Alkhereyf, Basma Abdulkareem, Esk, Ramy er, Mohammad Salameh, Hind Saddiki
no code implementations • MTSummit 2017 • Alexander Erdmann, Nizar Habash, Dima Taji, Houda Bouamor
We present the second ever evaluated Arabic dialect-to-dialect machine translation effort, and the first to leverage external resources beyond a small parallel corpus.
no code implementations • EMNLP 2017 • Nasser Zalmout, Nizar Habash
We make use of the resulting morphological models for scoring and ranking the analyses of the morphological analyzer for morphological disambiguation.
no code implementations • CONLL 2017 • Daniel Zeman, Martin Popel, Milan Straka, Jan Haji{\v{c}}, Joakim Nivre, Filip Ginter, Juhani Luotolahti, Sampo Pyysalo, Slav Petrov, Martin Potthast, Francis Tyers, Elena Badmaeva, Memduh Gokirmak, Anna Nedoluzhko, Silvie Cinkov{\'a}, Jan Haji{\v{c}} jr., Jaroslava Hlav{\'a}{\v{c}}ov{\'a}, V{\'a}clava Kettnerov{\'a}, Zde{\v{n}}ka Ure{\v{s}}ov{\'a}, Jenna Kanerva, Stina Ojala, Anna Missil{\"a}, Christopher D. Manning, Sebastian Schuster, Siva Reddy, Dima Taji, Nizar Habash, Herman Leung, Marie-Catherine de Marneffe, Manuela Sanguinetti, Maria Simi, Hiroshi Kanayama, Valeria de Paiva, Kira Droganova, H{\'e}ctor Mart{\'\i}nez Alonso, {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin, Umut Sulubacak, Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Georg Rehm, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, M, Michael l, Jesse Kirchner, Hector Fern Alcalde, ez, Jana Strnadov{\'a}, Esha Banerjee, Ruli Manurung, Antonio Stella, Atsuko Shimada, Sookyoung Kwak, Gustavo Mendon{\c{c}}a, L, Tatiana o, Rattima Nitisaroj, Josie Li
The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets.
no code implementations • SEMEVAL 2017 • Ramy Baly, Gilbert Badaro, Ali Hamdi, Rawan Moukalled, Rita Aoun, Georges El-Khoury, Ahmad Al Sallab, Hazem Hajj, Nizar Habash, Khaled Shaban, Wassim El-Hajj
While sentiment analysis in English has achieved significant progress, it remains a challenging task in Arabic given the rich morphology of the language.
no code implementations • SEMEVAL 2017 • Chukwuyem Onyibe, Nizar Habash
We describe a supervised system that uses optimized Condition Random Fields and lexical features to predict the sentiment of a tweet.
no code implementations • WS 2017 • Dima Taji, Nizar Habash, Daniel Zeman
We describe the process of creating NUDAR, a Universal Dependency treebank for Arabic.
no code implementations • WS 2017 • Lingliang Zhang, Nizar Habash, Godfried Toussaint
We present the MultiScript Phonetic Search algorithm to address the problem of language learners looking up unfamiliar words that they heard.
no code implementations • WS 2017 • Ramy Baly, Gilbert Badaro, Georges El-Khoury, Rawan Moukalled, Rita Aoun, Hazem Hajj, Wassim El-Hajj, Nizar Habash, Khaled Shaban
Opinion mining in Arabic is a challenging task given the rich morphology of the language.
no code implementations • WS 2017 • Salam Khalifa, Sara Hassan, Nizar Habash
We present CALIMAGLF, a Gulf Arabic morphological analyzer currently covering over 2, 600 verbal lemmas.
no code implementations • EACL 2017 • Nizar Habash, Nasser Zalmout, Dima Taji, Hieu Hoang, Maverick Alzate
We present Arab-Acquis, a large publicly available dataset for evaluating machine translation between 22 European languages and Arabic.
no code implementations • COLING 2016 • Esk, Ramy er, Nizar Habash, Owen Rambow, Arfath Pasha
Arabic dialects present a special problem for natural language processing because there are few resources, they have no standard orthography, and have not been studied much.
no code implementations • COLING 2016 • Francisco Guzm{\'a}n, Houda Bouamor, Ramy Baly, Nizar Habash
Evaluation of machine translation (MT) into morphologically rich languages (MRL) has not been well studied despite posing many challenges.
no code implementations • COLING 2016 • Anas Shahrour, Salam Khalifa, Dima Taji, Nizar Habash
In this paper, we present CamelParser, a state-of-the-art system for Arabic syntactic dependency analysis aligned with contextually disambiguated morphological features.
no code implementations • COLING 2016 • Salam Khalifa, Nasser Zalmout, Nizar Habash
In this paper, we present YAMAMA, a multi-dialect Arabic morphological analyzer and disambiguator.
no code implementations • COLING 2016 • Dana Abu Ali, Nizar Habash
This paper presents BOTTA, the first Arabic dialect chatbot.
no code implementations • WS 2016 • Nasser Zalmout, Hind Saddiki, Nizar Habash
Much research in education has been done on the study of different language teaching methods.
no code implementations • 12 Sep 2016 • Ahmed El Kholy, Nizar Habash
One common solution is to pivot through a third language for which there exist parallel corpora with the source and target languages.
no code implementations • LREC 2016 • Salam Khalifa, Nizar Habash, Dana Abdulrahim, Sara Hassan
Most Arabic natural language processing tools and resources are developed to serve Modern Standard Arabic (MSA), which is the official written language in the Arab World.
no code implementations • 18 Jun 2016 • Hassan Sajjad, Nadir Durrani, Francisco Guzman, Preslav Nakov, Ahmed Abdelali, Stephan Vogel, Wael Salloum, Ahmed El Kholy, Nizar Habash
The competition focused on informal dialectal Arabic, as used in SMS, chat, and speech.
no code implementations • 8 Jun 2016 • Amjad Almahairi, Kyunghyun Cho, Nizar Habash, Aaron Courville
Neural machine translation has become a major alternative to widely used phrase-based statistical machine translation.
no code implementations • LREC 2016 • Faisal Al-Shargi, Aidan Kaplan, Esk, Ramy er, Nizar Habash, Owen Rambow
We present new language resources for Moroccan and Sanaani Yemeni Arabic.
no code implementations • LREC 2016 • Wajdi Zaghouani, Nizar Habash, Ossama Obeid, Behrang Mohit, Houda Bouamor, Kemal Oflazer
We present our guidelines and annotation procedure to create a human corrected machine translated post-edited corpus for the Modern Standard Arabic.
no code implementations • LREC 2016 • Nizar Habash, Anas Shahrour, Muhamed Al-Khalil
We present a novel technique for Arabic morphological annotation.
no code implementations • LREC 2016 • Ayman Al Zaatari, Rim El Ballouli, Shady ELbassouni, Wassim El-Hajj, Hazem Hajj, Khaled Shaban, Nizar Habash, Emad Yahya
We focus on Arabic due to the recent popularity of blogs and microblogs in the Arab World and due to the lack of any such public corpora in Arabic.
no code implementations • LREC 2016 • Mohamed Al-Badrashiny, Arfath Pasha, Mona Diab, Nizar Habash, Owen Rambow, Wael Salloum, Esk, Ramy er
Text preprocessing is an important and necessary task for all NLP applications.
no code implementations • LREC 2016 • Irina Temnikova, Wajdi Zaghouani, Stephan Vogel, Nizar Habash
The goal of the cognitive machine translation (MT) evaluation approach is to build classifiers which assign post-editing effort scores to new texts.
no code implementations • LREC 2016 • Salam Khalifa, Houda Bouamor, Nizar Habash
Dialectal Arabic (DA) poses serious challenges for Natural Language Processing (NLP).
no code implementations • WS 2014 • Ann Bies, Zhiyi Song, Mohamed Maamouri, Stephen Grimes, Haejoong Lee, Jonathan Wright, Stephanie Strassel, Nizar Habash, Esk, Ramy er, Owen Rambow
no code implementations • LREC 2014 • Mohamed Maamouri, Ann Bies, Seth Kulick, Michael Ciul, Nizar Habash, Esk, Ramy er
This paper describes the parallel development of an Egyptian Arabic Treebank and a morphological analyzer for Egyptian Arabic (CALIMA).
no code implementations • LREC 2014 • Wajdi Zaghouani, Behrang Mohit, Nizar Habash, Ossama Obeid, Nadi Tomeh, Alla Rozovskaya, Noura Farra, Sarah Alkuhlani, Kemal Oflazer
Finally, we present the annotation tool that was developed as part of this project, the annotation pipeline, and the quality of the resulting annotations.
no code implementations • LREC 2014 • Arfath Pasha, Mohamed Al-Badrashiny, Mona Diab, Ahmed El Kholy, Esk, Ramy er, Nizar Habash, Manoj Pooleery, Owen Rambow, Ryan Roth
In this paper, we present MADAMIRA, a system for morphological analysis and disambiguation of Arabic that combines some of the best aspects of two previously commonly used systems for Arabic processing, MADA (Habash and Rambow, 2005; Habash et al., 2009; Habash et al., 2013) and AMIRA (Diab et al., 2007).
no code implementations • LREC 2014 • Houda Bouamor, Nizar Habash, Kemal Oflazer
The daily spoken variety of Arabic is often termed the colloquial or dialect form of Arabic.
no code implementations • LREC 2014 • Abir Masmoudi, Mariem Ellouze Khmekhem, Yannick Est{\`e}ve, lamia hadrich belguith, Nizar Habash
In this paper we describe an effort to create a corpus and phonetic dictionary for Tunisian Arabic Automatic Speech Recognition (ASR).
no code implementations • LREC 2014 • In{\`e}s Zribi, Rahma Boujelbane, Abir Masmoudi, Mariem Ellouze, Lamia Belguith, Nizar Habash
Tunisian Arabic is a dialect of the Arabic language spoken in Tunisia.
no code implementations • LREC 2014 • Mona Diab, Mohamed Al-Badrashiny, Maryam Aminian, Mohammed Attia, Heba Elfardy, Nizar Habash, Abdelati Hawwari, Wael Salloum, Pradeep Dasigi, Esk, Ramy er
Multiple levels of quality checks are performed on the output of each step in the creation process.
no code implementations • WS 2013 • Djam{\'e} Seddah, Reut Tsarfaty, S K{\"u}bler, ra, C, Marie ito, Jinho D. Choi, Rich{\'a}rd Farkas, Jennifer Foster, Iakes Goenaga, Koldo Gojenola Galletebeitia, Yoav Goldberg, Spence Green, Nizar Habash, Marco Kuhlmann, Wolfgang Maier, Joakim Nivre, Adam Przepi{\'o}rkowski, Ryan Roth, Wolfgang Seeker, Yannick Versley, Veronika Vincze, Marcin Woli{\'n}ski, Alina Wr{\'o}blewska, Eric Villemonte de la Clergerie
no code implementations • 22 Sep 2013 • Mona Diab, Nizar Habash, Owen Rambow, Ryan Roth
The Linguistic Data Consortium (LDC) has developed hundreds of data corpora for natural language processing (NLP) research.
no code implementations • JEPTALNRECITAL 2013 • Ahmed Hamdi, Rahma Boujelbane, Nizar Habash, Alexis Nasr
no code implementations • LREC 2012 • Nizar Habash, Mona Diab, Owen Rambow
Dialectal Arabic (DA) refers to the day-to-day vernaculars spoken in the Arab world.