Search Results for author: Nizar Habash

The focus of language model evaluation has transitioned towards reasoning and knowledge-intensive tasks, driven by advancements in pretraining large models.

Language Modelling Multiple-choice +1

Paper
Code

Automatic Error Type Annotation for Arabic

2 code implementations • CoNLL (EMNLP) 2021 • Riadh Belkebir, Nizar Habash

We present ARETA, an automatic error type annotation system for Modern Standard Arabic.

Grammatical Error Correction Vocal Bursts Type Prediction

Paper
Code

PALMYRA 2.0: A Configurable Multilingual Platform Independent Tool for Morphology and Syntax Annotation

1 code implementation • UDW (COLING) 2020 • Dima Taji, Nizar Habash

We present PALMYRA 2. 0, a graphical dependency-tree visualization and editing software.

Dependency Parsing

Paper
Code

Advancements in Arabic Grammatical Error Detection and Correction: An Empirical Investigation

1 code implementation • 24 May 2023 • Bashar Alhafni, Go Inoue, Christian Khairallah, Nizar Habash

We also define the task of multi-class Arabic grammatical error detection (GED) and present the first results on multi-class Arabic GED.

Grammatical Error Detection

Paper
Code

NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task

1 code implementation • EACL (WANLP) 2021 • Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Houda Bouamor, Nizar Habash

This Shared Task includes four subtasks: country-level Modern Standard Arabic (MSA) identification (Subtask 1. 1), country-level dialect identification (Subtask 1. 2), province-level MSA identification (Subtask 2. 1), and province-level sub-dialect identification (Subtask 2. 2).

Dialect Identification

Paper
Code

Gender-Aware Reinflection using Linguistically Enhanced Neural Models

1 code implementation • GeBNLP (COLING) 2020 • Bashar Alhafni, Nizar Habash, Houda Bouamor

In this paper, we present an approach for sentence-level gender reinflection using linguistically enhanced sequence-to-sequence models.

Grammatical Error Correction Sentence

Paper
Code

Morphotactic Modeling in an Open-source Multi-dialectal Arabic Morphological Analyzer and Generator

1 code implementation • NAACL (SIGMORPHON) 2022 • Nizar Habash, Reham Marzouk, Christian Khairallah, Salam Khalifa

Arabic is a morphologically rich and complex language, with numerous dialectal variants.

Paper
Code

NADI 2022: The Third Nuanced Arabic Dialect Identification Shared Task

1 code implementation • 18 Oct 2022 • Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Houda Bouamor, Nizar Habash

We describe findings of the third Nuanced Arabic Dialect Identification Shared Task (NADI 2022).

Dialect Identification Sentiment Analysis +1

Paper
Code

Morphosyntactic Tagging with Pre-trained Language Models for Arabic and its Dialects

1 code implementation • Findings (ACL) 2022 • Go Inoue, Salam Khalifa, Nizar Habash

We present state-of-the-art results on morphosyntactic tagging across different varieties of Arabic using fine-tuned pre-trained transformer language models.

Paper
Code

User-Centric Gender Rewriting

1 code implementation • NAACL 2022 • Bashar Alhafni, Nizar Habash, Houda Bouamor

In this paper, we define the task of gender rewriting in contexts involving two users (I and/or You) - first and second grammatical persons with independent grammatical gender preferences.

Paper
Code

The Paradigm Discovery Problem

1 code implementation • ACL 2020 • Alexander Erdmann, Micha Elsner, Shijie Wu, Ryan Cotterell, Nizar Habash

Our benchmark system first makes use of word embeddings and string similarity to cluster forms by cell and by paradigm.

Clustering Word Embeddings

Paper
Code

Cross-Lingual Transfer from Related Languages: Treating Low-Resource Maltese as Multilingual Code-Switching

1 code implementation • 30 Jan 2024 • Kurt Micallef, Nizar Habash, Claudia Borg, Fadhl Eryani, Houda Bouamor

Although multilingual language models exhibit impressive cross-lingual transfer capabilities on unseen languages, the performance on downstream tasks is impacted when there is a script disparity with the languages used in the multilingual model's pre-training data.

Cross-Lingual Transfer Transliteration

Paper
Code

Low Resourced Machine Translation via Morpho-syntactic Modeling: The Case of Dialectal Arabic

no code implementations • MTSummit 2017 • Alexander Erdmann, Nizar Habash, Dima Taji, Houda Bouamor

We present the second ever evaluated Arabic dialect-to-dialect machine translation effort, and the first to leverage external resources beyond a small parallel corpus.

Machine Translation Translation

Paper
Add Code

Morphological Constraints for Phrase Pivot Statistical Machine Translation

no code implementations • 12 Sep 2016 • Ahmed El Kholy, Nizar Habash

One common solution is to pivot through a third language for which there exist parallel corpora with the source and target languages.

Machine Translation Translation

Paper
Add Code

A Large Scale Corpus of Gulf Arabic

no code implementations • LREC 2016 • Salam Khalifa, Nizar Habash, Dana Abdulrahim, Sara Hassan

Most Arabic natural language processing tools and resources are developed to serve Modern Standard Arabic (MSA), which is the official written language in the Arab World.

Paper
Add Code

Egyptian Arabic to English Statistical Machine Translation System for NIST OpenMT'2015

no code implementations • 18 Jun 2016 • Hassan Sajjad, Nadir Durrani, Francisco Guzman, Preslav Nakov, Ahmed Abdelali, Stephan Vogel, Wael Salloum, Ahmed El Kholy, Nizar Habash

The competition focused on informal dialectal Arabic, as used in SMS, chat, and speech.

Language Modelling Translation +1

Paper
Add Code

First Result on Arabic Neural Machine Translation

no code implementations • 8 Jun 2016 • Amjad Almahairi, Kyunghyun Cho, Nizar Habash, Aaron Courville

Neural machine translation has become a major alternative to widely used phrase-based statistical machine translation.

Machine Translation Translation

Paper
Add Code

LDC Arabic Treebanks and Associated Corpora: Data Divisions Manual

no code implementations • 22 Sep 2013 • Mona Diab, Nizar Habash, Owen Rambow, Ryan Roth

The Linguistic Data Consortium (LDC) has developed hundreds of data corpora for natural language processing (NLP) research.

Paper
Add Code

MADARi: A Web Interface for Joint Arabic Morphological Annotation and Spelling Correction

no code implementations • LREC 2018 • Ossama Obeid, Salam Khalifa, Nizar Habash, Houda Bouamor, Wajdi Zaghouani, Kemal Oflazer

In this paper, we introduce MADARi, a joint morphological annotation and spelling correction system for texts in Standard and Dialectal Arabic.

Dialect Identification LEMMA +2

Paper
Add Code

Utilizing Character and Word Embeddings for Text Normalization with Sequence-to-Sequence Models

no code implementations • EMNLP 2018 • Daniel Watson, Nasser Zalmout, Nizar Habash

We show that providing the model with word-level features bridges the gap for the neural network approach to achieve a state-of-the-art F1 score on a standard Arabic language correction shared task dataset.

Word Embeddings

Paper
Add Code

Addressing Noise in Multidialectal Word Embeddings

no code implementations • ACL 2018 • Alex Erdmann, er, Nasser Zalmout, Nizar Habash

Arabic dialects lack large corpora and are noisy, being linguistically disparate with no standardized spelling.

Sentence Transliteration +1

Paper
Add Code

A Parallel Corpus for Evaluating Machine Translation between Arabic and European Languages

no code implementations • EACL 2017 • Nizar Habash, Nasser Zalmout, Dima Taji, Hieu Hoang, Maverick Alzate

We present Arab-Acquis, a large publicly available dataset for evaluating machine translation between 22 European languages and Arabic.

Benchmarking Machine Translation +1

Paper
Add Code

Noise-Robust Morphological Disambiguation for Dialectal Arabic

no code implementations • NAACL 2018 • Nasser Zalmout, Alex Erdmann, er, Nizar Habash

User-generated text tends to be noisy with many lexical and orthographic inconsistencies, making natural language processing (NLP) tasks more challenging.

Lexical Normalization Morphological Analysis +3

Paper
Add Code

OMAM at SemEval-2017 Task 4: Evaluation of English State-of-the-Art Sentiment Analysis Models for Arabic and a New Topic-based Model

no code implementations • SEMEVAL 2017 • Ramy Baly, Gilbert Badaro, Ali Hamdi, Rawan Moukalled, Rita Aoun, Georges El-Khoury, Ahmad Al Sallab, Hazem Hajj, Nizar Habash, Khaled Shaban, Wassim El-Hajj

While sentiment analysis in English has achieved significant progress, it remains a challenging task in Arabic given the rich morphology of the language.

Opinion Mining Sentiment Analysis

Paper
Add Code

OMAM at SemEval-2017 Task 4: English Sentiment Analysis with Conditional Random Fields

no code implementations • SEMEVAL 2017 • Chukwuyem Onyibe, Nizar Habash

We describe a supervised system that uses optimized Condition Random Fields and lexical features to predict the sentiment of a tweet.

Opinion Mining Sentiment Analysis +1

Paper
Add Code

Don't Throw Those Morphological Analyzers Away Just Yet: Neural Morphological Disambiguation for Arabic

no code implementations • EMNLP 2017 • Nasser Zalmout, Nizar Habash

We make use of the resulting morphological models for scoring and ranking the analyses of the morphological analyzer for morphological disambiguation.

Feature Engineering Language Modelling +3

Paper
Add Code

CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

no code implementations • CONLL 2017 • Daniel Zeman, Martin Popel, Milan Straka, Jan Haji{\v{c}}, Joakim Nivre, Filip Ginter, Juhani Luotolahti, Sampo Pyysalo, Slav Petrov, Martin Potthast, Francis Tyers, Elena Badmaeva, Memduh Gokirmak, Anna Nedoluzhko, Silvie Cinkov{\'a}, Jan Haji{\v{c}} jr., Jaroslava Hlav{\'a}{\v{c}}ov{\'a}, V{\'a}clava Kettnerov{\'a}, Zde{\v{n}}ka Ure{\v{s}}ov{\'a}, Jenna Kanerva, Stina Ojala, Anna Missil{\"a}, Christopher D. Manning, Sebastian Schuster, Siva Reddy, Dima Taji, Nizar Habash, Herman Leung, Marie-Catherine de Marneffe, Manuela Sanguinetti, Maria Simi, Hiroshi Kanayama, Valeria de Paiva, Kira Droganova, H{\'e}ctor Mart{\'\i}nez Alonso, {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin, Umut Sulubacak, Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Georg Rehm, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, M, Michael l, Jesse Kirchner, Hector Fern Alcalde, ez, Jana Strnadov{\'a}, Esha Banerjee, Ruli Manurung, Antonio Stella, Atsuko Shimada, Sookyoung Kwak, Gustavo Mendon{\c{c}}a, L, Tatiana o, Rattima Nitisaroj, Josie Li

The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets.

Dependency Parsing

Paper
Add Code

Feature Optimization for Predicting Readability of Arabic L1 and L2

no code implementations • WS 2018 • Hind Saddiki, Nizar Habash, Violetta Cavalli-Sforza, Muhamed Al Khalil

Advances in automatic readability assessment can impact the way people consume information in a number of domains.

Language Modelling

Paper
Add Code

Improving Domain Independent Question Parsing with Synthetic Treebanks

no code implementations • COLING 2018 • Halim-Antoine Boukaram, Nizar Habash, Micheline Ziadee, Majd Sakr

Automatic syntactic parsing for question constructions is a challenging task due to the paucity of training examples in most treebanks.

Paper
Add Code

A Bilingual Interactive Human Avatar Dialogue System

no code implementations • WS 2018 • Dana Abu Ali, Muaz Ahmad, Hayat Al Hassan, Paula Dozsa, Ming Hu, Jose Varias, Nizar Habash

This demonstration paper presents a bilingual (Arabic-English) interactive human avatar dialogue system.

Answer Selection Automatic Speech Recognition (ASR)

Paper
Add Code

Complementary Strategies for Low Resourced Morphological Modeling

no code implementations • WS 2018 • Alex Erdmann, er, Nizar Habash

Morphologically rich languages are challenging for natural language processing tasks due to data sparsity.

Morphological Analysis Word Embeddings

Paper
Add Code

An Arabic Morphological Analyzer and Generator with Copious Features

no code implementations • WS 2018 • Dima Taji, Salam Khalifa, Ossama Obeid, Fadhl Eryani, Nizar Habash

We introduce CALIMA-Star, a very rich Arabic morphological analyzer and generator that provides functional and form-based morphological features as well as built-in tokenization, phonological representation, lexical rationality and much more.

Paper
Add Code

A Morphological Analyzer for Gulf Arabic Verbs

no code implementations • WS 2017 • Salam Khalifa, Sara Hassan, Nizar Habash

We present CALIMAGLF, a Gulf Arabic morphological analyzer currently covering over 2, 600 verbal lemmas.

Morphological Tagging Part-Of-Speech Tagging +2

Paper
Add Code

A Characterization Study of Arabic Twitter Data with a Benchmarking for State-of-the-Art Opinion Mining Models

no code implementations • WS 2017 • Ramy Baly, Gilbert Badaro, Georges El-Khoury, Rawan Moukalled, Rita Aoun, Hazem Hajj, Wassim El-Hajj, Nizar Habash, Khaled Shaban

Opinion mining in Arabic is a challenging task given the rich morphology of the language.

Benchmarking Feature Engineering +2

Paper
Add Code

Robust Dictionary Lookup in Multiple Noisy Orthographies

no code implementations • WS 2017 • Lingliang Zhang, Nizar Habash, Godfried Toussaint

We present the MultiScript Phonetic Search algorithm to address the problem of language learners looking up unfamiliar words that they heard.

Transliteration

Paper
Add Code

Universal Dependencies for Arabic

no code implementations • WS 2017 • Dima Taji, Nizar Habash, Daniel Zeman

We describe the process of creating NUDAR, a Universal Dependency treebank for Arabic.

Machine Translation Question Answering

Paper
Add Code

The Columbia University - New York University Abu Dhabi SIGMORPHON 2016 Morphological Reinflection Shared Task Submission

no code implementations • WS 2016 • Dima Taji, Esk, Ramy er, Nizar Habash, Owen Rambow

Paper
Add Code

Analysis of Foreign Language Teaching Methods: An Automatic Readability Approach

no code implementations • WS 2016 • Nasser Zalmout, Hind Saddiki, Nizar Habash

Much research in education has been done on the study of different language teaching methods.

Paper
Add Code

Fine-Grained Arabic Dialect Identification

no code implementations • COLING 2018 • Mohammad Salameh, Houda Bouamor, Nizar Habash

Previous work on the problem of Arabic Dialect Identification typically targeted coarse-grained five dialect classes plus Standard Arabic (6-way classification).

Classification Dialect Identification +3

Paper
Add Code

A Cross-lingual Messenger with Keyword Searchable Phrases for the Travel Domain

no code implementations • COLING 2018 • Shehroze Khan, Jihyun Kim, Tarik Zulfikarpasic, Peter Chen, Nizar Habash

We present Qutr (Query Translator), a smart cross-lingual communication application for the travel domain.

Machine Translation Sentence +1

Paper
Add Code

Machine Translation Evaluation for Arabic using Morphologically-enriched Embeddings

no code implementations • COLING 2016 • Francisco Guzm{\'a}n, Houda Bouamor, Ramy Baly, Nizar Habash

Evaluation of machine translation (MT) into morphologically rich languages (MRL) has not been well studied despite posing many challenges.

Community Question Answering Machine Translation +6

Paper
Add Code

Creating Resources for Dialectal Arabic from a Single Annotation: A Case Study on Egyptian and Levantine

no code implementations • COLING 2016 • Esk, Ramy er, Nizar Habash, Owen Rambow, Arfath Pasha

Arabic dialects present a special problem for natural language processing because there are few resources, they have no standard orthography, and have not been studied much.

Morphological Analysis

Paper
Add Code

Botta: An Arabic Dialect Chatbot

no code implementations • COLING 2016 • Dana Abu Ali, Nizar Habash

This paper presents BOTTA, the first Arabic dialect chatbot.

Chatbot

Paper
Add Code

YAMAMA: Yet Another Multi-Dialect Arabic Morphological Analyzer

no code implementations • COLING 2016 • Salam Khalifa, Nasser Zalmout, Nizar Habash

In this paper, we present YAMAMA, a multi-dialect Arabic morphological analyzer and disambiguator.

Lemmatization Morphological Analysis +1

Paper
Add Code

CamelParser: A system for Arabic Syntactic Analysis and Morphological Disambiguation

no code implementations • COLING 2016 • Anas Shahrour, Salam Khalifa, Dima Taji, Nizar Habash

In this paper, we present CamelParser, a state-of-the-art system for Arabic syntactic dependency analysis aligned with contextually disambiguated morphological features.

Dependency Parsing Morphological Analysis +2

Paper
Add Code

A Parallel Corpus of Arabic-Japanese News Articles

no code implementations • LREC 2018 • Go Inoue, Nizar Habash, Yuji Matsumoto, Hiroyuki Aoyama

Machine Translation

Paper
Add Code

Palmyra: A Platform Independent Dependency Annotation Tool for Morphologically Rich Languages

no code implementations • LREC 2018 • Talha Javed, Nizar Habash, Dima Taji

Dependency Parsing Transliteration

Paper
Add Code

A Leveled Reading Corpus of Modern Standard Arabic

no code implementations • LREC 2018 • Muhamed Al Khalil, Hind Saddiki, Nizar Habash, Latifa Alfalasi

Document Classification Machine Translation +4

Paper
Add Code

The MADAR Arabic Dialect Corpus and Lexicon

no code implementations • LREC 2018 • Houda Bouamor, Nizar Habash, Mohammad Salameh, Wajdi Zaghouani, Owen Rambow, Dana Abdulrahim, Ossama Obeid, Salam Khalifa, Fadhl Eryani, Alex Erdmann, er, Kemal Oflazer

Transliteration

Paper
Add Code

Unified Guidelines and Resources for Arabic Dialect Orthography

no code implementations • LREC 2018 • Nizar Habash, Fadhl Eryani, Salam Khalifa, Owen Rambow, Dana Abdulrahim, Alex Erdmann, er, Reem Faraj, Wajdi Zaghouani, Houda Bouamor, Nasser Zalmout, Sara Hassan, Faisal Al-Shargi, Sakhar Alkhereyf, Basma Abdulkareem, Esk, Ramy er, Mohammad Salameh, Hind Saddiki

Speech Recognition Transliteration

Paper
Add Code

A Morphologically Annotated Corpus of Emirati Arabic

no code implementations • LREC 2018 • Salam Khalifa, Nizar Habash, Fadhl Eryani, Ossama Obeid, Dana Abdulrahim, Meera Al Kaabi

Lemmatization Machine Translation +2

Paper
Add Code

CoNLL-UL: Universal Morphological Lattices for Universal Dependency Parsing

no code implementations • LREC 2018 • Amir More, {\"O}zlem {\c{C}}etino{\u{g}}lu, {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin, Nizar Habash, Beno{\^\i}t Sagot, Djam{\'e} Seddah, Dima Taji, Reut Tsarfaty

Dependency Parsing Morphological Analysis +1

Paper
Add Code

An Arabic Dependency Treebank in the Travel Domain

no code implementations • 29 Jan 2019 • Dima Taji, Jamila El Gizuli, Nizar Habash

In this paper we present a dependency treebank of travel domain sentences in Modern Standard Arabic.

Translation

Paper
Add Code

Dependency Parsing of Modern Standard Arabic with Lexical and Inflectional Features

no code implementations • CL 2013 • Yuval Marton, Nizar Habash, Owen Rambow

Dependency Parsing

Paper
Add Code

Unsupervised Morphology-Based Vocabulary Expansion

no code implementations • ACL 2014 • Mohammad Sadegh Rasooli, Thomas Lippincott, Nizar Habash, Owen Rambow

Language Modelling Machine Translation +3

Paper
Add Code

Generalized Character-Level Spelling Error Correction

no code implementations • ACL 2014 • Noura Farra, Nadi Tomeh, Alla Rozovskaya, Nizar Habash

Machine Translation Sentiment Analysis +2

Paper
Add Code

Sentence Level Dialect Identification for Machine Translation System Selection

no code implementations • ACL 2014 • Wael Salloum, Heba Elfardy, Linda Alamir-Salloum, Nizar Habash, Mona Diab

Dialect Identification Machine Translation +2

Paper
Add Code

Language Independent Connectivity Strength Features for Phrase Pivot Statistical Machine Translation

no code implementations • ACL 2013 • Ahmed El Kholy, Nizar Habash, Gregor Leusch, Evgeny Matusov, Hassan Sawaf

Machine Translation Sentence +1

Paper
Add Code

Reranking with Linguistic and Semantic Features for Arabic Optical Character Recognition

no code implementations • ACL 2013 • Nadi Tomeh, Nizar Habash, Ryan Roth, Noura Farra, Pradeep Dasigi, Mona Diab

Language Modelling Learning-To-Rank +5

Paper
Add Code

Identifying Broken Plurals, Irregular Gender, and Rationality in Arabic Text

no code implementations • EACL 2012 • Sarah Alkuhlani, Nizar Habash

Paper
Add Code

Dialectal Arabic to English Machine Translation: Pivoting through Modern Standard Arabic

no code implementations • NAACL 2013 • Wael Salloum, Nizar Habash

Language Modelling Machine Translation +2

Paper
Add Code

Morphological Analysis and Disambiguation for Dialectal Arabic

no code implementations • NAACL 2013 • Nizar Habash, Ryan Roth, Owen Rambow, Esk, Ramy er, Nadi Tomeh

Lemmatization Machine Translation +3

Paper
Add Code

Automatic Morphological Enrichment of a Morphologically Underspecified Treebank

no code implementations • NAACL 2013 • Sarah Alkuhlani, Nizar Habash, Ryan Roth

Chunking Machine Translation +2

Paper
Add Code

Processing Spontaneous Orthography

no code implementations • NAACL 2013 • Esk, Ramy er, Nizar Habash, Owen Rambow, Nadi Tomeh

Machine Translation

Paper
Add Code

Arabic Dialect Processing Tutorial

no code implementations • NAACL 2012 • Mona Diab, Nizar Habash

Machine Translation Speech Recognition

Paper
Add Code

Predicting the Structure of Cooking Recipes

no code implementations • EMNLP 2015 • Jermsak Jermsurawong, Nizar Habash

Paper
Add Code

Improving Arabic Diacritization through Syntactic Analysis

no code implementations • EMNLP 2015 • Anas Shahrour, Salam Khalifa, Nizar Habash

Morphological Analysis Morphological Tagging +3

Paper
Add Code

Automatic Extraction of Morphological Lexicons from Morphologically Annotated Corpora

no code implementations • EMNLP 2013 • Esk, Ramy er, Nizar Habash, Owen Rambow

Morphological Analysis

Paper
Add Code

Automatic Transliteration of Romanized Dialectal Arabic

no code implementations • WS 2014 • Mohamed Al-Badrashiny, Esk, Ramy er, Nizar Habash, Owen Rambow

Language Modelling Spelling Correction +1

Paper
Add Code

The Illinois-Columbia System in the CoNLL-2014 Shared Task

no code implementations • WS 2014 • Alla Rozovskaya, Kai-Wei Chang, Mark Sammons, Dan Roth, Nizar Habash

Grammatical Error Correction

Paper
Add Code

Correction Annotation for Non-Native Arabic Texts: Guidelines and Corpus

no code implementations • WS 2015 • Wajdi Zaghouani, Nizar Habash, Houda Bouamor, Alla Rozovskaya, Behrang Mohit, Abeer Heider, Kemal Oflazer

Language Acquisition

Paper
Add Code

The Second QALB Shared Task on Automatic Text Correction for Arabic

no code implementations • WS 2015 • Alla Rozovskaya, Houda Bouamor, Nizar Habash, Wajdi Zaghouani, Ossama Obeid, Behrang Mohit

Machine Translation

Paper
Add Code

POS-tagging of Tunisian Dialect Using Standard Arabic Resources and Tools

no code implementations • WS 2015 • Ahmed Hamdi, Alexis Nasr, Nizar Habash, N{\'u}ria Gala

POS POS Tagging

Paper
Add Code

A Conventional Orthography for Algerian Arabic

no code implementations • WS 2015 • Houda Saadane, Nizar Habash

Paper
Add Code

Annotating Targets of Opinions in Arabic using Crowdsourcing

no code implementations • WS 2015 • Noura Farra, Kathy Mckeown, Nizar Habash

Fine-Grained Opinion Analysis Subjectivity Analysis

Paper
Add Code

Building a Corpus for Palestinian Arabic: a Preliminary Study

no code implementations • WS 2014 • Mustafa Jarrar, Nizar Habash, Diyam Akra, Nasser Zalmout

Paper
Add Code

The First QALB Shared Task on Automatic Text Correction for Arabic

no code implementations • WS 2014 • Behrang Mohit, Alla Rozovskaya, Nizar Habash, Wajdi Zaghouani, Ossama Obeid

Paper
Add Code

Transliteration of Arabizi into Arabic Orthography: Developing a Parallel Annotated Arabizi-Arabic Script SMS/Chat Corpus

no code implementations • WS 2014 • Ann Bies, Zhiyi Song, Mohamed Maamouri, Stephen Grimes, Haejoong Lee, Jonathan Wright, Stephanie Strassel, Nizar Habash, Esk, Ramy er, Owen Rambow

Transliteration

Paper
Add Code

A Pipeline Approach to Supervised Error Correction for the QALB-2014 Shared Task

no code implementations • WS 2014 • Nadi Tomeh, Nizar Habash, Esk, Ramy er, Joseph Le Roux

Grammatical Error Correction Language Modelling +1

Paper
Add Code

The Columbia System in the QALB-2014 Shared Task on Arabic Error Correction

no code implementations • WS 2014 • Alla Rozovskaya, Nizar Habash, Esk, Ramy er, Noura Farra, Wael Salloum

Grammatical Error Correction

Paper
Add Code

A Large Scale Arabic Sentiment Lexicon for Arabic Opinion Mining

no code implementations • WS 2014 • Gilbert Badaro, Ramy Baly, Hazem Hajj, Nizar Habash, Wassim El-Hajj

Opinion Mining Sentiment Analysis

Paper
Add Code

Domain and Dialect Adaptation for Machine Translation into Egyptian Arabic

no code implementations • WS 2014 • Serena Jeblee, Weston Feely, Houda Bouamor, Alon Lavie, Nizar Habash, Kemal Oflazer

Machine Translation Translation

Paper
Add Code

Foreign Words and the Automatic Processing of Arabic Social Media Text Written in Roman Script

no code implementations • WS 2014 • Esk, Ramy er, Mohamed Al-Badrashiny, Nizar Habash, Owen Rambow

Sentiment Analysis Transliteration

Paper
Add Code

INVITED TALK 1: Computational Processing of Arabic Dialects

no code implementations • WS 2014 • Nizar Habash

Machine Translation Morphological Analysis

Paper
Add Code

Automatic Correction and Extension of Morphological Annotations

no code implementations • WS 2013 • Ramy Eskander, Nizar Habash, Ann Bies, Seth Kulick, Mohamed Maamouri

POS POS Tagging

Paper
Add Code

SPMRL`13 Shared Task System: The CADIM Arabic Dependency Parser

no code implementations • WS 2013 • Yuval Marton, Nizar Habash, Owen Rambow, Sarah Alkhulani

Dependency Parsing Transliteration

Paper
Add Code

Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages

no code implementations • WS 2013 • Djam{\'e} Seddah, Reut Tsarfaty, S K{\"u}bler, ra, C, Marie ito, Jinho D. Choi, Rich{\'a}rd Farkas, Jennifer Foster, Iakes Goenaga, Koldo Gojenola Galletebeitia, Yoav Goldberg, Spence Green, Nizar Habash, Marco Kuhlmann, Wolfgang Maier, Joakim Nivre, Adam Przepi{\'o}rkowski, Ryan Roth, Wolfgang Seeker, Yannick Versley, Veronika Vincze, Marcin Woli{\'n}ski, Alina Wr{\'o}blewska, Eric Villemonte de la Clergerie

Paper
Add Code

Rich Morphology Generation Using Statistical Machine Translation

no code implementations • WS 2012 • Ahmed El Kholy, Nizar Habash

Language Modelling Machine Translation +2

Paper
Add Code

A Morphological Analyzer for Egyptian Arabic

no code implementations • WS 2012 • Nizar Habash, Esk, Ramy er, Abdelati Hawwari

Paper
Add Code

Elissa: A Dialectal to Standard Arabic Machine Translation System

no code implementations • COLING 2012 • Wael Salloum, Nizar Habash

Machine Translation Morphological Analysis +1

Paper
Add Code

Orthographic and Morphological Processing for Persian-to-English Statistical Machine Translation

no code implementations • IJCNLP 2013 • Mohammad Sadegh Rasooli, Ahmed El Kholy, Nizar Habash

Translation Transliteration

Paper
Add Code

Selective Combination of Pivot and Direct Statistical Machine Translation Models

no code implementations • IJCNLP 2013 • Ahmed El Kholy, Nizar Habash, Gregor Leusch, Evgeny Matusov, Hassan Sawaf

Machine Translation Translation

Paper
Add Code

A Web-based Annotation Framework For Large-Scale Text Correction

no code implementations • IJCNLP 2013 • Ossama Obeid, Wajdi Zaghouani, Behrang Mohit, Nizar Habash, Kemal Oflazer, Nadi Tomeh

Machine Translation

Paper
Add Code

DIRA: Dialectal Arabic Information Retrieval Assistant

no code implementations • IJCNLP 2013 • Arfath Pasha, Mohammad Al-Badrashiny, Mohamed Altantawy, Nizar Habash, Manoj Pooleery, Owen Rambow, Ryan M. Roth, Mona Diab

Information Retrieval Retrieval +1

Paper
Add Code

Developing an Egyptian Arabic Treebank: Impact of Dialectal Morphology on Annotation and Tool Development

no code implementations • LREC 2014 • Mohamed Maamouri, Ann Bies, Seth Kulick, Michael Ciul, Nizar Habash, Esk, Ramy er

This paper describes the parallel development of an Egyptian Arabic Treebank and a morphological analyzer for Egyptian Arabic (CALIMA).

Paper
Add Code

Tharwa: A Large Scale Dialectal Arabic - Standard Arabic - English Lexicon

no code implementations • LREC 2014 • Mona Diab, Mohamed Al-Badrashiny, Maryam Aminian, Mohammed Attia, Heba Elfardy, Nizar Habash, Abdelati Hawwari, Wael Salloum, Pradeep Dasigi, Esk, Ramy er

Multiple levels of quality checks are performed on the output of each step in the creation process.

POS

Paper
Add Code

A Conventional Orthography for Tunisian Arabic

no code implementations • LREC 2014 • In{\`e}s Zribi, Rahma Boujelbane, Abir Masmoudi, Mariem Ellouze, Lamia Belguith, Nizar Habash

Tunisian Arabic is a dialect of the Arabic language spoken in Tunisia.

Language Modelling Speech Recognition +2

Paper
Add Code

A Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition

no code implementations • LREC 2014 • Abir Masmoudi, Mariem Ellouze Khmekhem, Yannick Est{\`e}ve, lamia hadrich belguith, Nizar Habash

In this paper we describe an effort to create a corpus and phonetic dictionary for Tunisian Arabic Automatic Speech Recognition (ASR).

Arabic Speech Recognition Automatic Speech Recognition +2

Paper
Add Code

A Multidialectal Parallel Corpus of Arabic

no code implementations • LREC 2014 • Houda Bouamor, Nizar Habash, Kemal Oflazer

The daily spoken variety of Arabic is often termed the colloquial or dialect form of Arabic.

Dialect Identification Machine Translation +1

Paper
Add Code

MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic

no code implementations • LREC 2014 • Arfath Pasha, Mohamed Al-Badrashiny, Mona Diab, Ahmed El Kholy, Esk, Ramy er, Nizar Habash, Manoj Pooleery, Owen Rambow, Ryan Roth

In this paper, we present MADAMIRA, a system for morphological analysis and disambiguation of Arabic that combines some of the best aspects of two previously commonly used systems for Arabic processing, MADA (Habash and Rambow, 2005; Habash et al., 2009; Habash et al., 2013) and AMIRA (Diab et al., 2007).

Chunking Lemmatization +5

Paper
Add Code

Large Scale Arabic Error Annotation: Guidelines and Framework

no code implementations • LREC 2014 • Wajdi Zaghouani, Behrang Mohit, Nizar Habash, Ossama Obeid, Nadi Tomeh, Alla Rozovskaya, Noura Farra, Sarah Alkuhlani, Kemal Oflazer

Finally, we present the annotation tool that was developed as part of this project, the annotation pipeline, and the quality of the resulting annotations.

Machine Translation

Paper
Add Code

Conventional Orthography for Dialectal Arabic

no code implementations • LREC 2012 • Nizar Habash, Mona Diab, Owen Rambow

Dialectal Arabic (DA) refers to the day-to-day vernaculars spoken in the Arab world.

Speech Recognition

Paper
Add Code

Translating verbs between MSA and arabic dialects through deep morphological analysis (Un syst\`eme de traduction de verbes entre arabe standard et arabe dialectal par analyse morphologique profonde) [in French]

no code implementations • JEPTALNRECITAL 2013 • Ahmed Hamdi, Rahma Boujelbane, Nizar Habash, Alexis Nasr

Morphological Analysis

Paper
Add Code

ADIDA: Automatic Dialect Identification for Arabic

no code implementations • NAACL 2019 • Ossama Obeid, Mohammad Salameh, Houda Bouamor, Nizar Habash

This demo paper describes ADIDA, a web-based system for automatic dialect identification for Arabic text.

Dialect Identification

Paper
Add Code

DALILA: The Dialectal Arabic Linguistic Learning Assistant

no code implementations • LREC 2016 • Salam Khalifa, Houda Bouamor, Nizar Habash

Dialectal Arabic (DA) poses serious challenges for Natural Language Processing (NLP).

Paper
Add Code

Morphologically Annotated Corpora and Morphological Analyzers for Moroccan and Sanaani Yemeni Arabic

no code implementations • LREC 2016 • Faisal Al-Shargi, Aidan Kaplan, Esk, Ramy er, Nizar Habash, Owen Rambow

We present new language resources for Moroccan and Sanaani Yemeni Arabic.

Paper
Add Code

Building an Arabic Machine Translation Post-Edited Corpus: Guidelines and Annotation

no code implementations • LREC 2016 • Wajdi Zaghouani, Nizar Habash, Ossama Obeid, Behrang Mohit, Houda Bouamor, Kemal Oflazer

We present our guidelines and annotation procedure to create a human corrected machine translated post-edited corpus for the Modern Standard Arabic.

Machine Translation Translation

Paper
Add Code

Applying the Cognitive Machine Translation Evaluation Approach to Arabic

no code implementations • LREC 2016 • Irina Temnikova, Wajdi Zaghouani, Stephan Vogel, Nizar Habash

The goal of the cognitive machine translation (MT) evaluation approach is to build classifiers which assign post-editing effort scores to new texts.

Machine Translation Translation

Paper
Add Code

SPLIT: Smart Preprocessing (Quasi) Language Independent Tool

no code implementations • LREC 2016 • Mohamed Al-Badrashiny, Arfath Pasha, Mona Diab, Nizar Habash, Owen Rambow, Wael Salloum, Esk, Ramy er

Text preprocessing is an important and necessary task for all NLP applications.

Paper
Add Code

Exploiting Arabic Diacritization for High Quality Automatic Annotation

no code implementations • LREC 2016 • Nizar Habash, Anas Shahrour, Muhamed Al-Khalil

We present a novel technique for Arabic morphological annotation.

LEMMA Vocal Bursts Intensity Prediction

Paper
Add Code

Arabic Corpora for Credibility Analysis

no code implementations • LREC 2016 • Ayman Al Zaatari, Rim El Ballouli, Shady ELbassouni, Wassim El-Hajj, Hazem Hajj, Khaled Shaban, Nizar Habash, Emad Yahya

We focus on Arabic due to the recent popularity of blogs and microblogs in the Arab World and due to the lack of any such public corpora in Arabic.

BIG-bench Machine Learning General Classification

Paper
Add Code

The Impact of Preprocessing on Arabic-English Statistical and Neural Machine Translation

no code implementations • WS 2019 • Mai Oudah, Amjad Almahairi, Nizar Habash

Neural networks have become the state-of-the-art approach for machine translation (MT) in many languages.

Machine Translation Translation

Paper
Add Code

Simple Automatic Post-editing for Arabic-Japanese Machine Translation

no code implementations • 14 Jul 2019 • Ella Noll, Mai Oudah, Nizar Habash

A common bottleneck for developing machine translation (MT) systems for some language pairs is the lack of direct parallel translation data sets, in general and in certain domains.

Automatic Post-Editing Translation

Paper
Add Code

The Effectiveness of Simple Hybrid Systems for Hypernym Discovery

no code implementations • ACL 2019 • William Held, Nizar Habash

Hypernymy modeling has largely been separated according to two paradigms, pattern-based methods and distributional methods.

Hypernym Discovery

Paper
Add Code

Automatic Gender Identification and Reinflection in Arabic

no code implementations • WS 2019 • Nizar Habash, Houda Bouamor, Christine Chung

The impressive progress in many Natural Language Processing (NLP) applications has increased the awareness of some of the biases these NLP systems have with regards to gender identities.

Machine Translation Translation

Paper
Add Code

A Little Linguistics Goes a Long Way: Unsupervised Segmentation with Limited Language Specific Guidance

no code implementations • WS 2019 • Alex Erdmann, er, Salam Khalifa, Mai Oudah, Nizar Habash, Houda Bouamor

We present de-lexical segmentation, a linguistically motivated alternative to greedy or other unsupervised methods, requiring only minimal language specific input.

Paper
Add Code

Morphologically Annotated Corpora for Seven Arabic Dialects: Taizi, Sanaani, Najdi, Jordanian, Syrian, Iraqi and Moroccan

no code implementations • WS 2019 • Faisal Alshargi, Shahd Dibas, Sakhar Alkhereyf, Reem Faraj, Basmah Abdulkareem, Sane Yagi, Ouafaa Kacha, Nizar Habash, Owen Rambow

These corpora will be publicly available to serve as benchmarks for training and evaluating systems for Arabic dialect morphological analysis and disambiguation.

Morphological Analysis

Paper
Add Code

The MADAR Shared Task on Arabic Fine-Grained Dialect Identification

no code implementations • WS 2019 • Houda Bouamor, Sabit Hassan, Nizar Habash

In this paper, we present the results and findings of the MADAR Shared Task on Arabic Fine-Grained Dialect Identification.

Dialect Identification

Paper
Add Code

Joint Diacritization, Lemmatization, Normalization, and Fine-Grained Morphological Tagging

no code implementations • ACL 2020 • Nasser Zalmout, Nizar Habash

Semitic languages can be highly ambiguous, having several interpretations of the same surface forms, and morphologically rich, having many morphemes that realize several morphological features.

Lemmatization Morphological Tagging

Paper
Add Code

Adversarial Multitask Learning for Joint Multi-Feature and Multi-Dialect Morphological Modeling

no code implementations • ACL 2019 • Nasser Zalmout, Nizar Habash

In this paper we explore the use of multitask learning and adversarial training to address morphological richness and dialectal variations in the context of full morphological tagging.

Morphological Tagging Transfer Learning

Paper
Add Code

The Margarita Dialogue Corpus: A Data Set for Time-Offset Interactions and Unstructured Dialogue Systems

no code implementations • LREC 2020 • Alberto Chierici, Nizar Habash, Margarita Bicec

The first challenges are to define a sensible methodology for data collection and to create useful data sets for training the system to retrieve the best answer to a user{'}s question.

Question Answering Retrieval

Paper
Add Code

A Large-Scale Leveled Readability Lexicon for Standard Arabic

no code implementations • LREC 2020 • Muhamed Al Khalil, Nizar Habash, Zhengyang Jiang

We present a large-scale 26, 000-lemma leveled readability lexicon for Modern Standard Arabic.

LEMMA

Paper
Add Code

Morphological Analysis and Disambiguation for Gulf Arabic: The Interplay between Resources and Methods

no code implementations • LREC 2020 • Salam Khalifa, Nasser Zalmout, Nizar Habash

In this paper we present the first full morphological analysis and disambiguation system for Gulf Arabic.

Morphological Analysis Morphological Disambiguation +1

Paper
Add Code

A Spelling Correction Corpus for Multiple Arabic Dialects

no code implementations • LREC 2020 • Fadhl Eryani, Nizar Habash, Houda Bouamor, Salam Khalifa

In this paper, we present the MADAR CODA Corpus, a collection of 10, 000 sentences from five Arabic city dialects (Beirut, Cairo, Doha, Rabat, and Tunis) represented in the Conventional Orthography for Dialectal Arabic (CODA) in parallel with their raw original form.

Spelling Correction

Paper
Add Code

NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task

no code implementations • COLING (WANLP) 2020 • Muhammad Abdul-Mageed, Chiyu Zhang, Houda Bouamor, Nizar Habash

The data for the shared task covers a total of 100 provinces from 21 Arab countries and are collected from the Twitter domain.

Dialect Identification

Paper
Add Code

A Panoramic Survey of Natural Language Processing in the Arab World

no code implementations • 25 Nov 2020 • Kareem Darwish, Nizar Habash, Mourad Abbas, Hend Al-Khalifa, Huseein T. Al-Natsheh, Samhaa R. El-Beltagy, Houda Bouamor, Karim Bouzoubaa, Violetta Cavalli-Sforza, Wassim El-Hajj, Mustafa Jarrar, Hamdy Mubarak

The term natural language refers to any system of symbolic communication (spoken, signed or written) without intentional human planning and design.

Machine Translation Optical Character Recognition +6

Paper
Add Code

Multitask Easy-First Dependency Parsing: Exploiting Complementarities of Different Dependency Representations

no code implementations • COLING 2020 • Yash Kankanampati, Joseph Le Roux, Nadi Tomeh, Dima Taji, Nizar Habash

In this paper we present a parsing model for projective dependency trees which takes advantage of the existence of complementary dependency annotations which is the case in Arabic, with the availability of CATiB and UD treebanks.

Dependency Parsing

Paper
Add Code

Utilizing Subword Entities in Character-Level Sequence-to-Sequence Lemmatization Models

no code implementations • COLING 2020 • Nasser Zalmout, Nizar Habash

In addition to generic n-gram embeddings (using FastText), we experiment with concatenative (stems) and templatic (roots and patterns) morphological subwords.

LEMMA Lemmatization

Paper
Add Code

An Online Readability Leveled Arabic Thesaurus

no code implementations • COLING 2020 • Zhengyang Jiang, Nizar Habash, Muhamed Al Khalil

This demo paper introduces the online Readability Leveled Arabic Thesaurus interface.

Paper
Add Code

The Arabic Parallel Gender Corpus 2.0: Extensions and Analyses

no code implementations • LREC 2022 • Bashar Alhafni, Nizar Habash, Houda Bouamor

Much of the research on this issue has focused on mitigating gender bias in English NLP models and systems.

Machine Translation Text Generation +1

Paper
Add Code

A Cloud-based User-Centered Time-Offset Interaction Application

no code implementations • SIGDIAL (ACL) 2021 • Alberto Chierici, Tyeece Kiana Fredorcia Hensley, Wahib Kamran, Kertu Koss, Armaan Agrawal, Erin Meekhof, Goffredo Puccetti, Nizar Habash

Time-offset interaction applications (TOIA) allow simulating conversations with people who have previously recorded relevant video utterances, which are played in response to their interacting user.

Paper
Add Code

SIGMORPHON 2021 Shared Task on Morphological Reinflection: Generalization Across Languages

no code implementations • ACL (SIGMORPHON) 2021 • Tiago Pimentel, Maria Ryskina, Sabrina J. Mielke, Shijie Wu, Eleanor Chodroff, Brian Leonard, Garrett Nicolai, Yustinus Ghanggo Ate, Salam Khalifa, Nizar Habash, Charbel El-Khaissi, Omer Goldman, Michael Gasser, William Lane, Matt Coler, Arturo Oncevay, Jaime Rafael Montoya Samame, Gema Celeste Silva Villegas, Adam Ek, Jean-Philippe Bernardy, Andrey Shcherbakov, Aziyana Bayyr-ool, Karina Sheifer, Sofya Ganieva, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Andrew Krizhanovsky, Natalia Krizhanovsky, Clara Vania, Sardana Ivanova, Aelita Salchak, Christopher Straughn, Zoey Liu, Jonathan North Washington, Duygu Ataman, Witold Kieraś, Marcin Woliński, Totok Suhardijanto, Niklas Stoehr, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Richard J. Hatcher, Emily Prud'hommeaux, Ritesh Kumar, Mans Hulden, Botond Barta, Dorina Lakatos, Gábor Szolnok, Judit Ács, Mohit Raj, David Yarowsky, Ryan Cotterell, Ben Ambridge, Ekaterina Vylomova

This year's iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typological diversity and cross-lingual variation of morphosyntactic features.

Paper
Add Code

A Unified Model for Arabizi Detection and Transliteration using Sequence-to-Sequence Models

no code implementations • COLING (WANLP) 2020 • Ali Shazal, Aiza Usman, Nizar Habash

While online Arabic is primarily written using the Arabic script, a Roman-script variety called Arabizi is often seen on social media.

Transliteration

Paper
Add Code

A View From the Crowd: Evaluation Challenges for Time-Offset Interaction Applications

no code implementations • EACL (HumEval) 2021 • Alberto Chierici, Nizar Habash

Our contributions include the annotated dataset that we make publicly available and the proposal of Success Rate @k as an evaluation metric that is more appropriate than the traditional QA’s and information retrieval’s metrics.

Question Answering

Paper
Add Code

AraBART: a Pretrained Arabic Sequence-to-Sequence Model for Abstractive Summarization

no code implementations • 21 Mar 2022 • Moussa Kamal Eddine, Nadi Tomeh, Nizar Habash, Joseph Le Roux, Michalis Vazirgiannis

Like most natural language understanding and generation tasks, state-of-the-art models for summarization are transformer-based sequence-to-sequence architectures that are pretrained on large corpora.

Abstractive Text Summarization Natural Language Understanding

Paper
Add Code

UniMorph 4.0: Universal Morphology

no code implementations • LREC 2022 • Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova

The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.

Morphological Inflection

Paper
Add Code

Investigating Lexical Replacements for Arabic-English Code-Switched Data Augmentation

no code implementations • 25 May 2022 • Injy Hamed, Nizar Habash, Slim Abdennadher, Ngoc Thang Vu

Results show that using a predictive model results in more natural CS sentences compared to the random approach, as reported in human judgements.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

ZAEBUC: An Annotated Arabic-English Bilingual Writer Corpus

no code implementations • LREC 2022 • Nizar Habash, David Palfreyman

We present ZAEBUC, an annotated Arabic-English bilingual writer corpus comprising short essays by first-year university students at Zayed University in the United Arab Emirates.

Lemmatization Part-Of-Speech Tagging +2

Paper
Add Code

The Bahrain Corpus: A Multi-genre Corpus of Bahraini Arabic

no code implementations • LREC 2022 • Dana Abdulrahim, Go Inoue, Latifa Shamsan, Salam Khalifa, Nizar Habash

Our objective is to create a specialized corpus of the Bahraini Arabic dialect, which includes written texts as well as transcripts of audio files, belonging to a different genre (folktales, comedy shows, plays, cooking shows, etc.).

Paper
Add Code

Camel Treebank: An Open Multi-genre Arabic Dependency Treebank

no code implementations • LREC 2022 • Nizar Habash, Muhammed AbuOdeh, Dima Taji, Reem Faraj, Jamila El Gizuli, Omar Kallas

We present the Camel Treebank (CAMELTB), a 188K word open-source dependency treebank of Modern Standard and Classical Arabic.

Paper
Add Code

Hierarchical Aggregation of Dialectal Data for Arabic Dialect Identification

no code implementations • LREC 2022 • Nurpeiis Baimukan, Houda Bouamor, Nizar Habash

We test the value of such aggregation by building language models and using them in dialect identification.

Dialect Identification

Paper
Add Code

AraSAS: The Open Source Arabic Semantic Tagger

no code implementations • OSACT (LREC) 2022 • Mahmoud El-Haj, Elvis de Souza, Nouran Khallaf, Paul Rayson, Nizar Habash

This paper presents (AraSAS) the first open-source Arabic semantic analysis tagging system.

TAG

Paper
Add Code

Exploring Segmentation Approaches for Neural Machine Translation of Code-Switched Egyptian Arabic-English Text

no code implementations • 11 Oct 2022 • Marwa Gaser, Manuel Mager, Injy Hamed, Nizar Habash, Slim Abdennadher, Ngoc Thang Vu

For extreme low-resource scenarios, a combination of frequency and morphology-based segmentations is shown to perform the best.

Machine Translation Segmentation

Paper
Add Code

The User-Aware Arabic Gender Rewriter

no code implementations • 14 Oct 2022 • Bashar Alhafni, Ossama Obeid, Nizar Habash

We introduce the User-Aware Arabic Gender Rewriter, a user-centric web-based system for Arabic gender rewriting in contexts involving two users.

Paper
Add Code

Arabic Word-level Readability Visualization for Assisted Text Simplification

no code implementations • 19 Oct 2022 • Reem Hazim, Hind Saddiki, Bashar Alhafni, Muhamed Al Khalil, Nizar Habash

This demo paper presents a Google Docs add-on for automatic Arabic word-level readability visualization.

Lemmatization Text Simplification

Paper
Add Code

Maknuune: A Large Open Palestinian Arabic Lexicon

no code implementations • 24 Oct 2022 • Shahd Dibas, Christian Khairallah, Nizar Habash, Omar Fayez Sadi, Tariq Sairafy, Karmel Sarabta, Abrar Ardah

We present Maknuune, a large open lexicon for the Palestinian Arabic dialect.

Paper
Add Code

The Shared Task on Gender Rewriting

no code implementations • 22 Oct 2022 • Bashar Alhafni, Nizar Habash, Houda Bouamor, Ossama Obeid, Sultan Alrowili, Daliyah AlZeer, Khawlah M. Alshanqiti, Ahmed ElBakry, Muhammad ElNokrashy, Mohamed Gabr, Abderrahmane Issam, Abdelrahim Qaddoumi, K. Vijay-Shanker, Mahmoud Zyate

In this paper, we present the results and findings of the Shared Task on Gender Rewriting, which was organized as part of the Seventh Arabic Natural Language Processing Workshop.

Sentence

Paper
Add Code

ArzEn-ST: A Three-way Speech Translation Corpus for Code-Switched Egyptian Arabic - English

no code implementations • 22 Nov 2022 • Injy Hamed, Nizar Habash, Slim Abdennadher, Ngoc Thang Vu

We present our work on collecting ArzEn-ST, a code-switched Egyptian Arabic - English Speech Translation Corpus.

Machine Translation Translation

Paper
Add Code

Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition

no code implementations • 22 Nov 2022 • Injy Hamed, Amir Hussein, Oumnia Chellah, Shammur Chowdhury, Hamdy Mubarak, Sunayana Sitaram, Nizar Habash, Ahmed Ali

Code-switching poses a number of challenges and opportunities for multilingual automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Camelira: An Arabic Multi-Dialect Morphological Disambiguator

no code implementations • 30 Nov 2022 • Ossama Obeid, Go Inoue, Nizar Habash

We present Camelira, a web-based Arabic multi-dialect morphological disambiguation tool that covers four major variants of Arabic: Modern Standard Arabic, Egyptian, Gulf, and Levantine.

Dialect Identification Morphological Disambiguation

Paper
Add Code

Perception, performance, and detectability of conversational artificial intelligence across 32 university courses

no code implementations • 7 May 2023 • Hazem Ibrahim, Fengyuan Liu, Rohail Asim, Balaraju Battu, Sidahmed Benabderrahmane, Bashar Alhafni, Wifag Adnan, Tuka Alhanai, Bedoor AlShebli, Riyadh Baghdadi, Jocelyn J. Bélanger, Elena Beretta, Kemal Celik, Moumena Chaqfeh, Mohammed F. Daqaq, Zaynab El Bernoussi, Daryl Fougnie, Borja Garcia de Soto, Alberto Gandolfi, Andras Gyorgy, Nizar Habash, J. Andrew Harris, Aaron Kaufman, Lefteris Kirousis, Korhan Kocak, Kangsan Lee, Seungah S. Lee, Samreen Malik, Michail Maniatakos, David Melcher, Azzam Mourad, Minsu Park, Mahmoud Rasras, Alicja Reuben, Dania Zantout, Nancy W. Gleason, Kinga Makovi, Talal Rahwan, Yasir Zaki

Moreover, current AI-text classifiers cannot reliably detect ChatGPT's use in school work, due to their propensity to classify human-written answers as AI-generated, as well as the ease with which AI-generated text can be edited to evade detection.

Paper
Add Code

Data Augmentation Techniques for Machine Translation of Code-Switched Texts: A Comparative Study

no code implementations • 23 Oct 2023 • Injy Hamed, Nizar Habash, Ngoc Thang Vu

Linguistic theories and random lexical replacement prove to be effective in the lack of CSW parallel data, where both approaches achieve similar results.

Data Augmentation Machine Translation +2

Paper
Add Code

NADI 2023: The Fourth Nuanced Arabic Dialect Identification Shared Task

no code implementations • 24 Oct 2023 • Muhammad Abdul-Mageed, AbdelRahim Elmadany, Chiyu Zhang, El Moatez Billah Nagoudi, Houda Bouamor, Nizar Habash

We describe the findings of the fourth Nuanced Arabic Dialect Identification Shared Task (NADI 2023).

Dialect Identification Machine Translation +1

Paper
Add Code

Computational Morphology and Lexicography Modeling of Modern Standard Arabic Nominals

no code implementations • 1 Feb 2024 • Christian Khairallah, Reham Marzouk, Salam Khalifa, Mayar Nassar, Nizar Habash

Modern Standard Arabic (MSA) nominals present many morphological and lexical modeling challenges that have not been consistently addressed previously.

Paper
Add Code

M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection

no code implementations • 17 Feb 2024 • Yuxia Wang, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Osama Mohanned Afzal, Tarek Mahmoud, Giovanni Puccetti, Thomas Arnold, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov

The advent of Large Language Models (LLMs) has brought an unprecedented surge in machine-generated text (MGT) across diverse channels.

Task 2 Text Detection

Paper
Add Code

ZAEBUC-Spoken: A Multilingual Multidialectal Arabic-English Speech Corpus

no code implementations • 27 Mar 2024 • Injy Hamed, Fadhl Eryani, David Palfreyman, Nizar Habash

We present ZAEBUC-Spoken, a multilingual multidialectal Arabic-English speech corpus.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection

no code implementations • 22 Apr 2024 • Yuxia Wang, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Osama Mohammed Afzal, Tarek Mahmoud, Giovanni Puccetti, Thomas Arnold, Chenxi Whitehouse, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov

The task attracted a large number of participants: subtask A monolingual (126), subtask A multilingual (59), subtask B (70), and subtask C (30).

Binary Classification Text Detection

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.