Search Results for author: Nizar Habash

Found 130 papers, 6 papers with code

A Unified Model for Arabizi Detection and Transliteration using Sequence-to-Sequence Models

no code implementations COLING (WANLP) 2020 Ali Shazal, Aiza Usman, Nizar Habash

While online Arabic is primarily written using the Arabic script, a Roman-script variety called Arabizi is often seen on social media.


Gender-Aware Reinflection using Linguistically Enhanced Neural Models

1 code implementation GeBNLP (COLING) 2020 Bashar Alhafni, Nizar Habash, Houda Bouamor

In this paper, we present an approach for sentence-level gender reinflection using linguistically enhanced sequence-to-sequence models.

Grammatical Error Correction

A Cloud-based User-Centered Time-Offset Interaction Application

no code implementations SIGDIAL (ACL) 2021 Alberto Chierici, Tyeece Kiana Fredorcia Hensley, Wahib Kamran, Kertu Koss, Armaan Agrawal, Erin Meekhof, Goffredo Puccetti, Nizar Habash

Time-offset interaction applications (TOIA) allow simulating conversations with people who have previously recorded relevant video utterances, which are played in response to their interacting user.

A View From the Crowd: Evaluation Challenges for Time-Offset Interaction Applications

no code implementations EACL (HumEval) 2021 Alberto Chierici, Nizar Habash

Our contributions include the annotated dataset that we make publicly available and the proposal of Success Rate @k as an evaluation metric that is more appropriate than the traditional QA’s and information retrieval’s metrics.

Question Answering

The Arabic Parallel Gender Corpus 2.0: Extensions and Analyses

no code implementations18 Oct 2021 Bashar Alhafni, Nizar Habash, Houda Bouamor

Much of the research on this issue has focused on mitigating gender bias in English NLP models and systems.

Machine Translation Text Generation +1

Morphosyntactic Tagging with Pre-trained Language Models for Arabic and its Dialects

no code implementations13 Oct 2021 Go Inoue, Salam Khalifa, Nizar Habash

We present state-of-the-art results on morphosyntactic tagging across different varieties of Arabic using fine-tuned pre-trained transformer language models.


NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task

1 code implementation EACL (WANLP) 2021 Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Houda Bouamor, Nizar Habash

This Shared Task includes four subtasks: country-level Modern Standard Arabic (MSA) identification (Subtask 1. 1), country-level dialect identification (Subtask 1. 2), province-level MSA identification (Subtask 2. 1), and province-level sub-dialect identification (Subtask 2. 2).

Dialect Identification

Multitask Easy-First Dependency Parsing: Exploiting Complementarities of Different Dependency Representations

no code implementations COLING 2020 Yash Kankanampati, Joseph Le Roux, Nadi Tomeh, Dima Taji, Nizar Habash

In this paper we present a parsing model for projective dependency trees which takes advantage of the existence of complementary dependency annotations which is the case in Arabic, with the availability of CATiB and UD treebanks.

Dependency Parsing

An Online Readability Leveled Arabic Thesaurus

no code implementations COLING 2020 Zhengyang Jiang, Nizar Habash, Muhamed Al Khalil

This demo paper introduces the online Readability Leveled Arabic Thesaurus interface.

Utilizing Subword Entities in Character-Level Sequence-to-Sequence Lemmatization Models

no code implementations COLING 2020 Nasser Zalmout, Nizar Habash

In addition to generic n-gram embeddings (using FastText), we experiment with concatenative (stems) and templatic (roots and patterns) morphological subwords.


NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task

no code implementations COLING (WANLP) 2020 Muhammad Abdul-Mageed, Chiyu Zhang, Houda Bouamor, Nizar Habash

The data for the shared task covers a total of 100 provinces from 21 Arab countries and are collected from the Twitter domain.

Dialect Identification

The Paradigm Discovery Problem

1 code implementation ACL 2020 Alexander Erdmann, Micha Elsner, Shijie Wu, Ryan Cotterell, Nizar Habash

Our benchmark system first makes use of word embeddings and string similarity to cluster forms by cell and by paradigm.

Word Embeddings

A Large-Scale Leveled Readability Lexicon for Standard Arabic

no code implementations LREC 2020 Muhamed Al Khalil, Nizar Habash, Zhengyang Jiang

We present a large-scale 26, 000-lemma leveled readability lexicon for Modern Standard Arabic.

The Margarita Dialogue Corpus: A Data Set for Time-Offset Interactions and Unstructured Dialogue Systems

no code implementations LREC 2020 Alberto Chierici, Nizar Habash, Margarita Bicec

The first challenges are to define a sensible methodology for data collection and to create useful data sets for training the system to retrieve the best answer to a user{'}s question.

Question Answering

A Spelling Correction Corpus for Multiple Arabic Dialects

no code implementations LREC 2020 Fadhl Eryani, Nizar Habash, Houda Bouamor, Salam Khalifa

In this paper, we present the MADAR CODA Corpus, a collection of 10, 000 sentences from five Arabic city dialects (Beirut, Cairo, Doha, Rabat, and Tunis) represented in the Conventional Orthography for Dialectal Arabic (CODA) in parallel with their raw original form.

Spelling Correction

Adversarial Multitask Learning for Joint Multi-Feature and Multi-Dialect Morphological Modeling

no code implementations ACL 2019 Nasser Zalmout, Nizar Habash

In this paper we explore the use of multitask learning and adversarial training to address morphological richness and dialectal variations in the context of full morphological tagging.

Morphological Tagging Transfer Learning

Joint Diacritization, Lemmatization, Normalization, and Fine-Grained Morphological Tagging

no code implementations ACL 2020 Nasser Zalmout, Nizar Habash

Semitic languages can be highly ambiguous, having several interpretations of the same surface forms, and morphologically rich, having many morphemes that realize several morphological features.

Lemmatization Morphological Tagging

A Little Linguistics Goes a Long Way: Unsupervised Segmentation with Limited Language Specific Guidance

no code implementations WS 2019 Alex Erdmann, er, Salam Khalifa, Mai Oudah, Nizar Habash, Houda Bouamor

We present de-lexical segmentation, a linguistically motivated alternative to greedy or other unsupervised methods, requiring only minimal language specific input.

The MADAR Shared Task on Arabic Fine-Grained Dialect Identification

no code implementations WS 2019 Houda Bouamor, Sabit Hassan, Nizar Habash

In this paper, we present the results and findings of the MADAR Shared Task on Arabic Fine-Grained Dialect Identification.

Dialect Identification

Automatic Gender Identification and Reinflection in Arabic

no code implementations WS 2019 Nizar Habash, Houda Bouamor, Christine Chung

The impressive progress in many Natural Language Processing (NLP) applications has increased the awareness of some of the biases these NLP systems have with regards to gender identities.

Machine Translation Translation

Simple Automatic Post-editing for Arabic-Japanese Machine Translation

no code implementations14 Jul 2019 Ella Noll, Mai Oudah, Nizar Habash

A common bottleneck for developing machine translation (MT) systems for some language pairs is the lack of direct parallel translation data sets, in general and in certain domains.

Automatic Post-Editing Translation

The Effectiveness of Simple Hybrid Systems for Hypernym Discovery

no code implementations ACL 2019 William Held, Nizar Habash

Hypernymy modeling has largely been separated according to two paradigms, pattern-based methods and distributional methods.

Hypernym Discovery

ADIDA: Automatic Dialect Identification for Arabic

no code implementations NAACL 2019 Ossama Obeid, Mohammad Salameh, Houda Bouamor, Nizar Habash

This demo paper describes ADIDA, a web-based system for automatic dialect identification for Arabic text.

Dialect Identification

An Arabic Dependency Treebank in the Travel Domain

no code implementations29 Jan 2019 Dima Taji, Jamila El Gizuli, Nizar Habash

In this paper we present a dependency treebank of travel domain sentences in Modern Standard Arabic.


An Arabic Morphological Analyzer and Generator with Copious Features

no code implementations WS 2018 Dima Taji, Salam Khalifa, Ossama Obeid, Fadhl Eryani, Nizar Habash

We introduce CALIMA-Star, a very rich Arabic morphological analyzer and generator that provides functional and form-based morphological features as well as built-in tokenization, phonological representation, lexical rationality and much more.


Utilizing Character and Word Embeddings for Text Normalization with Sequence-to-Sequence Models

no code implementations EMNLP 2018 Daniel Watson, Nasser Zalmout, Nizar Habash

We show that providing the model with word-level features bridges the gap for the neural network approach to achieve a state-of-the-art F1 score on a standard Arabic language correction shared task dataset.

Word Embeddings

Fine-Grained Arabic Dialect Identification

no code implementations COLING 2018 Mohammad Salameh, Houda Bouamor, Nizar Habash

Previous work on the problem of Arabic Dialect Identification typically targeted coarse-grained five dialect classes plus Standard Arabic (6-way classification).

Classification Dialect Identification +3

Improving Domain Independent Question Parsing with Synthetic Treebanks

no code implementations COLING 2018 Halim-Antoine Boukaram, Nizar Habash, Micheline Ziadee, Majd Sakr

Automatic syntactic parsing for question constructions is a challenging task due to the paucity of training examples in most treebanks.

Addressing Noise in Multidialectal Word Embeddings

no code implementations ACL 2018 Alex Erdmann, er, Nasser Zalmout, Nizar Habash

Arabic dialects lack large corpora and are noisy, being linguistically disparate with no standardized spelling.

Transliteration Word Embeddings

Noise-Robust Morphological Disambiguation for Dialectal Arabic

no code implementations NAACL 2018 Nasser Zalmout, Alex Erdmann, er, Nizar Habash

User-generated text tends to be noisy with many lexical and orthographic inconsistencies, making natural language processing (NLP) tasks more challenging.

Lexical Normalization Morphological Analysis +3

Low Resourced Machine Translation via Morpho-syntactic Modeling: The Case of Dialectal Arabic

no code implementations18 Dec 2017 Alexander Erdmann, Nizar Habash, Dima Taji, Houda Bouamor

We present the second ever evaluated Arabic dialect-to-dialect machine translation effort, and the first to leverage external resources beyond a small parallel corpus.

Machine Translation Translation

Don't Throw Those Morphological Analyzers Away Just Yet: Neural Morphological Disambiguation for Arabic

no code implementations EMNLP 2017 Nasser Zalmout, Nizar Habash

We make use of the resulting morphological models for scoring and ranking the analyses of the morphological analyzer for morphological disambiguation.

Feature Engineering Language Modelling +3

OMAM at SemEval-2017 Task 4: English Sentiment Analysis with Conditional Random Fields

no code implementations SEMEVAL 2017 Chukwuyem Onyibe, Nizar Habash

We describe a supervised system that uses optimized Condition Random Fields and lexical features to predict the sentiment of a tweet.

Opinion Mining Sentiment Analysis +1

Robust Dictionary Lookup in Multiple Noisy Orthographies

no code implementations WS 2017 Lingliang Zhang, Nizar Habash, Godfried Toussaint

We present the MultiScript Phonetic Search algorithm to address the problem of language learners looking up unfamiliar words that they heard.


Creating Resources for Dialectal Arabic from a Single Annotation: A Case Study on Egyptian and Levantine

no code implementations COLING 2016 Esk, Ramy er, Nizar Habash, Owen Rambow, Arfath Pasha

Arabic dialects present a special problem for natural language processing because there are few resources, they have no standard orthography, and have not been studied much.

Morphological Analysis

CamelParser: A system for Arabic Syntactic Analysis and Morphological Disambiguation

no code implementations COLING 2016 Anas Shahrour, Salam Khalifa, Dima Taji, Nizar Habash

In this paper, we present CamelParser, a state-of-the-art system for Arabic syntactic dependency analysis aligned with contextually disambiguated morphological features.

Dependency Parsing Morphological Analysis +2

Morphological Constraints for Phrase Pivot Statistical Machine Translation

no code implementations12 Sep 2016 Ahmed El Kholy, Nizar Habash

One common solution is to pivot through a third language for which there exist parallel corpora with the source and target languages.

Machine Translation Translation

A Large Scale Corpus of Gulf Arabic

no code implementations LREC 2016 Salam Khalifa, Nizar Habash, Dana Abdulrahim, Sara Hassan

Most Arabic natural language processing tools and resources are developed to serve Modern Standard Arabic (MSA), which is the official written language in the Arab World.


First Result on Arabic Neural Machine Translation

no code implementations8 Jun 2016 Amjad Almahairi, Kyunghyun Cho, Nizar Habash, Aaron Courville

Neural machine translation has become a major alternative to widely used phrase-based statistical machine translation.

Machine Translation Translation

Building an Arabic Machine Translation Post-Edited Corpus: Guidelines and Annotation

no code implementations LREC 2016 Wajdi Zaghouani, Nizar Habash, Ossama Obeid, Behrang Mohit, Houda Bouamor, Kemal Oflazer

We present our guidelines and annotation procedure to create a human corrected machine translated post-edited corpus for the Modern Standard Arabic.

Machine Translation Translation

Arabic Corpora for Credibility Analysis

no code implementations LREC 2016 Ayman Al Zaatari, Rim El Ballouli, Shady ELbassouni, Wassim El-Hajj, Hazem Hajj, Khaled Shaban, Nizar Habash, Emad Yahya

We focus on Arabic due to the recent popularity of blogs and microblogs in the Arab World and due to the lack of any such public corpora in Arabic.

General Classification

DALILA: The Dialectal Arabic Linguistic Learning Assistant

no code implementations LREC 2016 Salam Khalifa, Houda Bouamor, Nizar Habash

Dialectal Arabic (DA) poses serious challenges for Natural Language Processing (NLP).

Applying the Cognitive Machine Translation Evaluation Approach to Arabic

no code implementations LREC 2016 Irina Temnikova, Wajdi Zaghouani, Stephan Vogel, Nizar Habash

The goal of the cognitive machine translation (MT) evaluation approach is to build classifiers which assign post-editing effort scores to new texts.

Machine Translation Translation

Developing an Egyptian Arabic Treebank: Impact of Dialectal Morphology on Annotation and Tool Development

no code implementations LREC 2014 Mohamed Maamouri, Ann Bies, Seth Kulick, Michael Ciul, Nizar Habash, Esk, Ramy er

This paper describes the parallel development of an Egyptian Arabic Treebank and a morphological analyzer for Egyptian Arabic (CALIMA).

MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic

no code implementations LREC 2014 Arfath Pasha, Mohamed Al-Badrashiny, Mona Diab, Ahmed El Kholy, Esk, Ramy er, Nizar Habash, Manoj Pooleery, Owen Rambow, Ryan Roth

In this paper, we present MADAMIRA, a system for morphological analysis and disambiguation of Arabic that combines some of the best aspects of two previously commonly used systems for Arabic processing, MADA (Habash and Rambow, 2005; Habash et al., 2009; Habash et al., 2013) and AMIRA (Diab et al., 2007).

Chunking Lemmatization +6

Large Scale Arabic Error Annotation: Guidelines and Framework

no code implementations LREC 2014 Wajdi Zaghouani, Behrang Mohit, Nizar Habash, Ossama Obeid, Nadi Tomeh, Alla Rozovskaya, Noura Farra, Sarah Alkuhlani, Kemal Oflazer

Finally, we present the annotation tool that was developed as part of this project, the annotation pipeline, and the quality of the resulting annotations.

Machine Translation

LDC Arabic Treebanks and Associated Corpora: Data Divisions Manual

no code implementations22 Sep 2013 Mona Diab, Nizar Habash, Owen Rambow, Ryan Roth

The Linguistic Data Consortium (LDC) has developed hundreds of data corpora for natural language processing (NLP) research.

Cannot find the paper you are looking for? You can Submit a new open access paper.