no code implementations • COLING 2022 • Ife Adebara, Muhammad Abdul-Mageed, Miikka Silfverberg
In this work, we perform fine-grained analysis on how an SMT system compares with two NMT systems (BiLSTM and Transformer) when translating bare nouns in Yorùbá into English.
no code implementations • 21 Dec 2022 • Ife Adebara, AbdelRahim Elmadany, Muhammad Abdul-Mageed, Alcides Alcoba Inciarte
Multilingual language models (MLMs) acquire valuable, generalizable linguistic information during pretraining and have advanced the state of the art on task-specific finetuning.
no code implementations • 21 Dec 2022 • El Moatez Billah Nagoudi, Muhammad Abdul-Mageed, AbdelRahim Elmadany, Alcides Alcoba Inciarte, Md Tawkat Islam Khondaker
Task agnostic generative pretraining (GPT) has recently proved promising for zero- and few-shot learning, gradually diverting attention from the expensive supervised learning paradigm.
no code implementations • 21 Dec 2022 • AbdelRahim Elmadany, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed
Due to their crucial role in all NLP, several benchmarks have been proposed to evaluate pretrained language models.
no code implementations • 11 Nov 2022 • Md Tawkat Islam Khondaker, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan
The prevalence of abusive language on different online platforms has been a major concern that raises the need for automated cross-platform abusive language detection.
no code implementations • 22 Oct 2022 • Md Tawkat Islam Khondaker, El Moatez Billah Nagoudi, AbdelRahim Elmadany, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan
Contrastive learning (CL) brought significant progress to various NLP tasks.
1 code implementation • 21 Oct 2022 • Ife Adebara, AbdelRahim Elmadany, Muhammad Abdul-Mageed, Alcides Alcoba Inciarte
Problematically, most of the world's 7000+ languages today are not covered by LID technologies.
1 code implementation • 18 Oct 2022 • Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Houda Bouamor, Nizar Habash
We describe findings of the third Nuanced Arabic Dialect Identification Shared Task (NADI 2022).
1 code implementation • 14 Oct 2022 • Ganesh Jawahar, Subhabrata Mukherjee, Xiaodong Liu, Young Jin Kim, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Ahmed Hassan Awadallah, Sebastien Bubeck, Jianfeng Gao
Motivated by the recent advances in sparsely activated models like the Mixture-of-Experts (MoE) model, we introduce sparse architectures with conditional computation into the NAS search space.
no code implementations • 6 Oct 2022 • Ganesh Jawahar, Subhabrata Mukherjee, Debadeepta Dey, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Caio Cesar Teodoro Mendes, Gustavo Henrique de Rosa, Shital Shah
Autocomplete is a task where the user inputs a piece of text, termed prompt, which is conditioned by the model to generate semantically coherent continuation.
1 code implementation • OSACT (LREC) 2022 • El Moatez Billah Nagoudi, AbdelRahim Elmadany, Muhammad Abdul-Mageed
We present TURJUMAN, a neural toolkit for translating from 20 languages into Modern Standard Arabic (MSA).
no code implementations • 14 May 2022 • Wei-Rui Chen, Muhammad Abdul-Mageed
Machine translation (MT) involving Indigenous languages, including those possibly endangered, is challenging due to lack of sufficient parallel data.
1 code implementation • 10 Apr 2022 • Chiyu Zhang, Muhammad Abdul-Mageed, El Moatez Billah Nagoudi
With the proliferation of social media, many studies resort to social media to construct datasets for developing social meaning understanding systems.
no code implementations • ACL 2022 • Ganesh Jawahar, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan
We propose a neural network based detector that detects manipulated news articles by reasoning about the facts mentioned in the article.
no code implementations • ACL 2022 • Ife Adebara, Muhammad Abdul-Mageed
Aligning with ACL 2022 special Theme on "Language Diversity: from Low Resource to Endangered Languages", we discuss the major linguistic and sociopolitical challenges facing development of NLP technologies for African languages.
no code implementations • 15 Mar 2022 • Chiyu Zhang, Muhammad Abdul-Mageed, Ganesh Jawahar
Recent progress in representation and contrastive learning in NLP has not widely considered the class of \textit{sociopragmatic meaning} (i. e., meaning in interaction within different language communities).
1 code implementation • 10 Feb 2022 • Peter Sullivan, Toshiko Shibano, Muhammad Abdul-Mageed
ASR systems designed for native English (L1) usually underperform on non-native English (L2).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 1 Oct 2021 • Toshiko Shibano, Xinyi Zhang, Mia Taige Li, Haejin Cho, Peter Sullivan, Muhammad Abdul-Mageed
To address the performance gap of English ASR models on L2 English speakers, we evaluate fine-tuning of pretrained wav2vec 2. 0 models (Baevski et al., 2020; Xu et al., 2021) on L2-ARCTIC, a non-native English speech corpus (Zhao et al., 2018) under different training settings.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • ACL 2022 • El Moatez Billah Nagoudi, AbdelRahim Elmadany, Muhammad Abdul-Mageed
For evaluation, we introduce a novel benchmark for ARabic language GENeration (ARGEN), covering seven important tasks.
no code implementations • WMT (EMNLP) 2021 • Wei-Rui Chen, Muhammad Abdul-Mageed
In this work, we investigate methods for the challenging task of translating between low-resource language pairs that exhibit some level of similarity.
Low-Resource Neural Machine Translation
Transfer Learning
+1
no code implementations • WMT (EMNLP) 2021 • Ife Adebara, Muhammad Abdul-Mageed
We investigate transfer learning based on pre-trained neural machine translation models to translate between (low-resource) similar languages.
no code implementations • ACL 2021 • Muhammad Abdul-Mageed, AbdelRahim Elmadany, El Moatez Billah Nagoudi
To evaluate our models, we also introduce ARLUE, a new benchmark for multi-dialectal Arabic language understanding evaluation.
1 code implementation • WASSA (ACL) 2022 • Chiyu Zhang, Muhammad Abdul-Mageed
We test our models on $15$ different Twitter datasets for social meaning detection.
no code implementations • NAACL (CALCS) 2021 • El Moatez Billah Nagoudi, AbdelRahim Elmadany, Muhammad Abdul-Mageed
Our work is in the context of the Shared Task on Machine Translation in Code-Switching.
no code implementations • NAACL (CALCS) 2021 • Ganesh Jawahar, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan
We describe models focused at the understudied problem of translating between monolingual and code-mixed language pairs.
1 code implementation • NAACL (NLP4IF) 2021 • Tariq Alhindi, Amal Alabdulkarim, Ali Alshehri, Muhammad Abdul-Mageed, Preslav Nakov
With the continuing spread of misinformation and disinformation online, it is of increasing importance to develop combating mechanisms at scale in the form of automated systems that support multiple languages.
no code implementations • NAACL (AmericasNLP) 2021 • El Moatez Billah Nagoudi, Wei-Rui Chen, Muhammad Abdul-Mageed, Hasan Cavusogl
Transformer language models have become fundamental components of natural language processing based pipelines.
2 code implementations • 7 Mar 2021 • Ife Adebara, Muhammad Abdul-Mageed, Miikka Silfverberg
In this work, we perform fine-grained analysis on how an SMT system compares with two NMT systems (BiLSTM and Transformer) when translating bare nouns in Yor\`ub\'a into English.
1 code implementation • EACL (WANLP) 2021 • Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Houda Bouamor, Nizar Habash
This Shared Task includes four subtasks: country-level Modern Standard Arabic (MSA) identification (Subtask 1. 1), country-level dialect identification (Subtask 1. 2), province-level MSA identification (Subtask 2. 1), and province-level sub-dialect identification (Subtask 2. 2).
1 code implementation • EACL 2021 • Muhammad Khalifa, Muhammad Abdul-Mageed, Khaled Shaalan
We propose to self-train pre-trained language models in zero- and few-shot scenarios to improve performance on data-scarce varieties using only resources from data-rich ones.
2 code implementations • 27 Dec 2020 • Muhammad Abdul-Mageed, AbdelRahim Elmadany, El Moatez Billah Nagoudi
To evaluate our models, we also introduce ARLUE, a new benchmark for multi-dialectal Arabic language understanding evaluation.
1 code implementation • EACL (WANLP) 2021 • Muhammad Abdul-Mageed, Shady Elbassuoni, Jad Doughman, AbdelRahim Elmadany, El Moatez Billah Nagoudi, Yorgo Zoughby, Ahmad Shaher, Iskander Gaba, Ahmed Helal, Mohammed El-Razzaz
We describe DiaLex, a benchmark for intrinsic evaluation of dialectal Arabic word embedding.
1 code implementation • COLING (WANLP) 2020 • El Moatez Billah Nagoudi, AbdelRahim Elmadany, Muhammad Abdul-Mageed, Tariq Alhindi, Hasan Cavusoglu
Finally, we develop the first models for detecting manipulated Arabic news and achieve state-of-the-art results on Arabic fake news detection (macro F1=70. 06).
1 code implementation • COLING 2020 • Ganesh Jawahar, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan
Detectors that can distinguish text generated by TGM from human written text play a vital role in mitigating such misuse of TGMs.
no code implementations • COLING (WANLP) 2020 • Muhammad Abdul-Mageed, Chiyu Zhang, Houda Bouamor, Nizar Habash
The data for the shared task covers a total of 100 provinces from 21 Arab countries and are collected from the Twitter domain.
1 code implementation • EMNLP 2020 • Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Lyle Ungar
Although the prediction of dialects is an important language processing task, with a wide range of applications, existing work is largely limited to coarse-grained varieties.
no code implementations • WS 2020 • Kaili Vesik, Muhammad Abdul-Mageed, Miikka Silfverberg
The task of grapheme-to-phoneme (G2P) conversion is important for both speech recognition and synthesis.
no code implementations • WS 2020 • El Moatez Billah Nagoudi, Muhammad Abdul-Mageed, Hasan Cavusoglu
We describe our submission to the 2020 Duolingo Shared Task on Simultaneous Translation And Paraphrase for Language Education (STAPLE) (Mayhew et al., 2020).
no code implementations • LREC 2020 • AbdelRahim Elmadany, Chiyu Zhang, Muhammad Abdul-Mageed, Azadeh Hashemi
Social media are pervasive in our life, making it necessary to ensure safe online experiences by detecting and removing offensive and hate speech.
no code implementations • LREC 2020 • Ali Alshehri, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed
Social media communication has become a significant part of daily activity in modern societies.
1 code implementation • EACL 2021 • Muhammad Abdul-Mageed, AbdelRahim Elmadany, El Moatez Billah Nagoudi, Dinesh Pabbi, Kunal Verma, Rannie Lin
We describe Mega-COV, a billion-scale dataset from Twitter for studying COVID-19.
1 code implementation • LREC 2020 • Muhammad Abdul-Mageed, Chiyu Zhang, Azadeh Hashemi, El Moatez Billah Nagoudi
We describe AraNet, a collection of deep learning Arabic social media processing tools.
no code implementations • 2 Nov 2019 • Muhammad Abdul-Mageed, Chiyu Zhang, Arun Rajendran, AbdelRahim Elmadany, Michael Przystupa, Lyle Ungar
In this work we exploit a newly-created Arabic dataset with ground truth age and gender labels to learn these attributes both individually and in a multi-task setting at the sentence level.
no code implementations • 31 Oct 2019 • Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Arun Rajendran, Lyle Ungar
Prediction of language varieties and dialects is an important language processing task, with a wide range of applications.
no code implementations • 9 Sep 2019 • Chiyu Zhang, Muhammad Abdul-Mageed
We report our models for detecting age, language variety, and gender from social media data in the context of the Arabic author profiling and deception detection shared task (APDA).
no code implementations • 8 Sep 2019 • Chiyu Zhang, Muhammad Abdul-Mageed
Supervised deep learning requires large amounts of training data.
no code implementations • WS 2019 • Michael Przystupa, Muhammad Abdul-Mageed
We investigate the utility of neural machine translation on three low-resource, similar language pairs: Spanish {--} Portuguese, Czech {--} Polish, and Hindi {--} Nepali.
no code implementations • WS 2019 • Chiyu Zhang, Muhammad Abdul-Mageed
We present our deep leaning system submitted to MADAR shared task 2 focused on twitter user dialect identification.
no code implementations • 9 Jun 2019 • Arun Rajendran, Chiyu Zhang, Muhammad Abdul-Mageed
In this paper, we explore various approaches for learning two types of appraisal components from happy language.
no code implementations • 9 Jun 2019 • Arun Rajendran, Chiyu Zhang, Muhammad Abdul-Mageed
We examine learning offensive content on Twitter with limited, imbalanced data.
no code implementations • SEMEVAL 2019 • Arun Rajendran, Chiyu Zhang, Muhammad Abdul-Mageed
We examine learning offensive content on Twitter with limited, imbalanced data.
no code implementations • SEMEVAL 2019 • Chiyu Zhang, Arun Rajendran, Muhammad Abdul-Mageed
We present our deep learning models submitted to the SemEval-2019 Task 4 competition focused at Hyperpartisan News Detection.
no code implementations • 8 Apr 2019 • Pramit Saha, Muhammad Abdul-Mageed, Sidney Fels
Speech-related Brain Computer Interface (BCI) technologies provide effective vocal communication strategies for controlling devices through speech commands interpreted from brain signals.
no code implementations • 8 Apr 2019 • Pramit Saha, Muhammad Abdul-Mageed, Sidney Fels
Speech-related Brain Computer Interfaces (BCI) aim primarily at finding an alternative vocal communication pathway for people with speaking disabilities.
no code implementations • WS 2018 • Hassan Alhuzali, Mohamed Elaraby, Muhammad Abdul-Mageed
We also offer an analysis of system performance and the impact of training data size on the task.
no code implementations • COLING 2018 • Mohamed Elaraby, Muhammad Abdul-Mageed
We treat these two limitations:We (1) benchmark the data, and (2) empirically test6different deep learning methods on thetask, comparing peformance to several classical machine learning models under different condi-tions (i. e., both binary and multi-way classification).
no code implementations • WS 2018 • Hassan Alhuzali, Muhammad Abdul-Mageed, Lyle Ungar
The computational treatment of emotion in natural language text remains relatively limited, and Arabic is no exception.
no code implementations • ACL 2017 • Muhammad Abdul-Mageed, Lyle Ungar
Accurate detection of emotion from natural language has applications ranging from building emotional chatbots to better understanding individuals and their lives.
no code implementations • WS 2017 • Muhammad Abdul-Mageed
Although there is by now a considerable amount of research on subjectivity and sentiment analysis on morphologically-rich languages, it is still unclear how lexical information can best be modeled in these languages.
no code implementations • LREC 2014 • Muhammad Abdul-Mageed, Mona Diab
The computational treatment of subjectivity and sentiment in natural language is usually significantly improved by applying features exploiting lexical resources where entries are tagged with semantic orientation (e. g., positive, negative values).
no code implementations • LREC 2012 • Muhammad Abdul-Mageed, Mona Diab
We present AWATIF, a multi-genre corpus of Modern Standard Arabic (MSA) labeled for subjectivity and sentiment analysis (SSA) at the sentence level.