Search Results for author: Houda Bouamor

Found 44 papers, 4 papers with code

Gender-Aware Reinflection using Linguistically Enhanced Neural Models

1 code implementation GeBNLP (COLING) 2020 Bashar Alhafni, Nizar Habash, Houda Bouamor

In this paper, we present an approach for sentence-level gender reinflection using linguistically enhanced sequence-to-sequence models.

Grammatical Error Correction

User-Centric Gender Rewriting

1 code implementation4 May 2022 Bashar Alhafni, Nizar Habash, Houda Bouamor

In this paper, we define the task of gender rewriting in contexts involving two users (I and/or You) - first and second grammatical persons with independent grammatical gender preferences.

The Arabic Parallel Gender Corpus 2.0: Extensions and Analyses

no code implementations18 Oct 2021 Bashar Alhafni, Nizar Habash, Houda Bouamor

Much of the research on this issue has focused on mitigating gender bias in English NLP models and systems.

Machine Translation Text Generation +1

The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models

1 code implementation EACL (WANLP) 2021 Go Inoue, Bashar Alhafni, Nurpeiis Baimukan, Houda Bouamor, Nizar Habash

In this paper, we explore the effects of language variants, data sizes, and fine-tuning task types in Arabic pre-trained language models.

Language Modelling

NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task

1 code implementation EACL (WANLP) 2021 Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Houda Bouamor, Nizar Habash

This Shared Task includes four subtasks: country-level Modern Standard Arabic (MSA) identification (Subtask 1. 1), country-level dialect identification (Subtask 1. 2), province-level MSA identification (Subtask 2. 1), and province-level sub-dialect identification (Subtask 2. 2).

Dialect Identification

NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task

no code implementations COLING (WANLP) 2020 Muhammad Abdul-Mageed, Chiyu Zhang, Houda Bouamor, Nizar Habash

The data for the shared task covers a total of 100 provinces from 21 Arab countries and are collected from the Twitter domain.

Dialect Identification

A Spelling Correction Corpus for Multiple Arabic Dialects

no code implementations LREC 2020 Fadhl Eryani, Nizar Habash, Houda Bouamor, Salam Khalifa

In this paper, we present the MADAR CODA Corpus, a collection of 10, 000 sentences from five Arabic city dialects (Beirut, Cairo, Doha, Rabat, and Tunis) represented in the Conventional Orthography for Dialectal Arabic (CODA) in parallel with their raw original form.

Spelling Correction

The MADAR Shared Task on Arabic Fine-Grained Dialect Identification

no code implementations WS 2019 Houda Bouamor, Sabit Hassan, Nizar Habash

In this paper, we present the results and findings of the MADAR Shared Task on Arabic Fine-Grained Dialect Identification.

Dialect Identification

A Little Linguistics Goes a Long Way: Unsupervised Segmentation with Limited Language Specific Guidance

no code implementations WS 2019 Alex Erdmann, er, Salam Khalifa, Mai Oudah, Nizar Habash, Houda Bouamor

We present de-lexical segmentation, a linguistically motivated alternative to greedy or other unsupervised methods, requiring only minimal language specific input.

Automatic Gender Identification and Reinflection in Arabic

no code implementations WS 2019 Nizar Habash, Houda Bouamor, Christine Chung

The impressive progress in many Natural Language Processing (NLP) applications has increased the awareness of some of the biases these NLP systems have with regards to gender identities.

Machine Translation Translation

ADIDA: Automatic Dialect Identification for Arabic

no code implementations NAACL 2019 Ossama Obeid, Mohammad Salameh, Houda Bouamor, Nizar Habash

This demo paper describes ADIDA, a web-based system for automatic dialect identification for Arabic text.

Dialect Identification

Fine-Grained Arabic Dialect Identification

no code implementations COLING 2018 Mohammad Salameh, Houda Bouamor, Nizar Habash

Previous work on the problem of Arabic Dialect Identification typically targeted coarse-grained five dialect classes plus Standard Arabic (6-way classification).

Classification Dialect Identification +3

Low Resourced Machine Translation via Morpho-syntactic Modeling: The Case of Dialectal Arabic

no code implementations18 Dec 2017 Alexander Erdmann, Nizar Habash, Dima Taji, Houda Bouamor

We present the second ever evaluated Arabic dialect-to-dialect machine translation effort, and the first to leverage external resources beyond a small parallel corpus.

Machine Translation Translation

Building an Arabic Machine Translation Post-Edited Corpus: Guidelines and Annotation

no code implementations LREC 2016 Wajdi Zaghouani, Nizar Habash, Ossama Obeid, Behrang Mohit, Houda Bouamor, Kemal Oflazer

We present our guidelines and annotation procedure to create a human corrected machine translated post-edited corpus for the Modern Standard Arabic.

Machine Translation Translation

DALILA: The Dialectal Arabic Linguistic Learning Assistant

no code implementations LREC 2016 Salam Khalifa, Houda Bouamor, Nizar Habash

Dialectal Arabic (DA) poses serious challenges for Natural Language Processing (NLP).

Guidelines and Framework for a Large Scale Arabic Diacritized Corpus

no code implementations LREC 2016 Wajdi Zaghouani, Houda Bouamor, Abdelati Hawwari, Mona Diab, Ossama Obeid, Mahmoud Ghoneim, Sawsan Alqahtani, Kemal Oflazer

This paper presents the annotation guidelines developed as part of an effort to create a large scale manually diacritized corpus for various Arabic text genres.

YouDACC: the Youtube Dialectal Arabic Comment Corpus

no code implementations LREC 2014 Ahmed Salama, Houda Bouamor, Behrang Mohit, Kemal Oflazer

This paper presents YOUDACC, an automatically annotated large-scale multi-dialectal Arabic corpus collected from user comments on Youtube videos.

A contrastive review of paraphrase acquisition techniques

no code implementations LREC 2012 Houda Bouamor, Aur{\'e}lien Max, Gabriel Illouz, Anne Vilnat

This paper addresses the issue of what approach should be used for building a corpus of sententential paraphrases depending on one's requirements.

Information Retrieval Machine Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.