Search Results for author: Muhammad Abdul-Mageed

Found 95 papers, 26 papers with code

Linguistically-Motivated Yorùbá-English Machine Translation

no code implementations • COLING 2022 • Ife Adebara, Muhammad Abdul-Mageed, Miikka Silfverberg

In this work, we perform fine-grained analysis on how an SMT system compares with two NMT systems (BiLSTM and Transformer) when translating bare nouns in Yorùbá into English.

Machine Translation NMT +1

Paper
Add Code

Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English

no code implementations • ICNLSP 2021 • Toshiko Shibano, Xinyi Zhang, Mia Taige Li, Haejin Cho, Peter Sullivan, Muhammad Abdul-Mageed

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Interplay of Machine Translation, Diacritics, and Diacritization

no code implementations • 9 Apr 2024 • Wei-Rui Chen, Ife Adebara, Muhammad Abdul-Mageed

We examine these two questions in both high-resource (HR) and low-resource (LR) settings across 55 different languages (36 African languages and 19 European languages).

Machine Translation Multi-Task Learning +1

Paper
Add Code

Distilling Text Style Transfer With Self-Explanation From LLMs

no code implementations • 2 Mar 2024 • Chiyu Zhang, Honglong Cai, Yuezhang, Li, Yuexin Wu, Le Hou, Muhammad Abdul-Mageed

Text Style Transfer (TST) seeks to alter the style of text while retaining its core content.

In-Context Learning Knowledge Distillation +2

Paper
Add Code

Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks

1 code implementation • 1 Mar 2024 • Fakhraddin Alwajih, El Moatez Billah Nagoudi, Gagan Bhatia, Abdelrahman Mohamed, Muhammad Abdul-Mageed

Multimodal large language models (MLLMs) have proven effective in a wide range of tasks requiring complex reasoning and linguistic comprehension.

Visual Reasoning

Paper
Code

GreenLLaMA: A Framework for Detoxification with Explanations

no code implementations • 25 Feb 2024 • Md Tawkat Islam Khondaker, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan

We then train a suite of detoxification models with our cross-platform corpus.

Paper
Add Code

SPAR: Personalized Content-Based Recommendation via Long Engagement Attention

no code implementations • 16 Feb 2024 • Chiyu Zhang, Yifei Sun, Jun Chen, Jie Lei, Muhammad Abdul-Mageed, Sinong Wang, Rong Jin, Sem Park, Ning Yao, Bo Long

Leveraging users' long engagement histories is essential for personalized content recommendations.

Language Modelling Large Language Model

Paper
Add Code

FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models

no code implementations • 16 Feb 2024 • Gagan Bhatia, El Moatez Billah Nagoudi, Hasan Cavusoglu, Muhammad Abdul-Mageed

We introduce FinTral, a suite of state-of-the-art multimodal large language models (LLMs) built upon the Mistral-7b model and tailored for financial analysis.

Decision Making Retrieval

Paper
Add Code

Cheetah: Natural Language Generation for 517 African Languages

no code implementations • 2 Jan 2024 • Ife Adebara, AbdelRahim Elmadany, Muhammad Abdul-Mageed

The findings of this study contribute to advancing NLP research in low-resource settings, enabling greater accessibility and inclusion for African languages in a rapidly expanding digital landscape.

Language Modelling Text Generation

Paper
Add Code

Beyond English: Evaluating LLMs for Arabic Grammatical Error Correction

no code implementations • 13 Dec 2023 • Sang Yun Kwon, Gagan Bhatia, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed

Our best model achieves a new SOTA on Arabic GEC, with $73. 29$ and $73. 26$ F$_{1}$ on the 2014 and 2015 QALB datasets, respectively, compared to peer-reviewed published baselines.

Few-Shot Learning Grammatical Error Correction +1

Paper
Add Code

CalliPaint: Chinese Calligraphy Inpainting with Diffusion Model

no code implementations • 3 Dec 2023 • Qisheng Liao, Zhinuo Wang, Muhammad Abdul-Mageed, Gus Xia

Chinese calligraphy can be viewed as a unique form of visual art.

Image Inpainting

Paper
Add Code

Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability

no code implementations • 16 Nov 2023 • Wei-Rui Chen, Ife Adebara, Khai Duy Doan, Qisheng Liao, Muhammad Abdul-Mageed

However, the range of languages ChatGPT can handle remains largely a mystery.

Language Identification

Paper
Add Code

Violet: A Vision-Language Model for Arabic Image Captioning with Gemini Decoder

no code implementations • 15 Nov 2023 • Abdelrahman Mohamed, Fakhraddin Alwajih, El Moatez Billah Nagoudi, Alcides Alcoba Inciarte, Muhammad Abdul-Mageed

We also manually prepare a new dataset for evaluation.

Image Captioning Language Modelling

Paper
Add Code

ProMap: Effective Bilingual Lexicon Induction via Language Model Prompting

1 code implementation • 28 Oct 2023 • Abdellah El Mekki, Muhammad Abdul-Mageed, ElMoatez Billah Nagoudi, Ismail Berrada, Ahmed Khoumsi

We also demonstrate the effectiveness of ProMap in re-ranking results from other BLI methods such as with aligned static word embeddings.

Bilingual Lexicon Induction Language Modelling +4

Paper
Code

Arabic Fine-Grained Entity Recognition

no code implementations • 26 Oct 2023 • Haneen Liqreina, Mustafa Jarrar, Mohammed Khalilia, Ahmed Oumar El-Shangiti, Muhammad Abdul-Mageed

To compute the baselines of WojoodF ine, we fine-tune three pre-trained Arabic BERT encoders in three settings: flat NER, nested NER and nested NER with subtypes and achieved F1 score of 0. 920, 0. 866, and 0. 885, respectively.

NER

Paper
Add Code

LLM Performance Predictors are good initializers for Architecture Search

no code implementations • 25 Oct 2023 • Ganesh Jawahar, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Dujian Ding

We show that HS-NAS performs very similar to SOTA NAS across benchmarks, reduces search hours by 50% roughly, and in some cases, improves latency, GFLOPs, and model size.

Machine Translation Neural Architecture Search

Paper
Add Code

NADI 2023: The Fourth Nuanced Arabic Dialect Identification Shared Task

no code implementations • 24 Oct 2023 • Muhammad Abdul-Mageed, AbdelRahim Elmadany, Chiyu Zhang, El Moatez Billah Nagoudi, Houda Bouamor, Nizar Habash

We describe the findings of the fourth Nuanced Arabic Dialect Identification Shared Task (NADI 2023).

Dialect Identification Machine Translation +1

Paper
Add Code

Octopus: A Multitask Model and Toolkit for Arabic Natural Language Generation

no code implementations • 24 Oct 2023 • AbdelRahim Elmadany, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed

While many researchers have proposed models and solutions for individual problems, there is an acute shortage of a comprehensive Arabic natural language generation toolkit that is capable of handling a wide range of tasks.

Text Generation

Paper
Add Code

WojoodNER 2023: The First Arabic Named Entity Recognition Shared Task

no code implementations • 24 Oct 2023 • Mustafa Jarrar, Muhammad Abdul-Mageed, Mohammed Khalilia, Bashar Talafha, AbdelRahim Elmadany, Nagham Hamad, Alaa' Omar

The winning teams achieved F1 scores of 91. 96 and 93. 73 in FlatNER and NestedNER, respectively.

named-entity-recognition Named Entity Recognition +1

Paper
Add Code

The Skipped Beat: A Study of Sociopragmatic Understanding in LLMs for 64 Languages

1 code implementation • 23 Oct 2023 • Chiyu Zhang, Khai Duy Doan, Qisheng Liao, Muhammad Abdul-Mageed

We evaluate the performance of various multilingual pretrained language models (e. g., mT5) and instruction-tuned LLMs (e. g., BLOOMZ, ChatGPT) on SPARROW through fine-tuning, zero-shot, and/or few-shot learning.

Emotion Recognition Few-Shot Learning

Paper
Code

VoxArabica: A Robust Dialect-Aware Arabic Speech Recognition System

no code implementations • 17 Oct 2023 • Abdul Waheed, Bashar Talafha, Peter Sullivan, AbdelRahim Elmadany, Muhammad Abdul-Mageed

We train a wide range of models such as HuBERT (DID), Whisper, and XLS-R (ASR) in a supervised setting for Arabic DID and ASR tasks.

Arabic Speech Recognition Automatic Speech Recognition +4

Paper
Add Code

ChatGPT for Arabic Grammatical Error Correction

no code implementations • 8 Aug 2023 • Sang Yun Kwon, Gagan Bhatia, El Moatez Billah Nagoud, Muhammad Abdul-Mageed

Recently, large language models (LLMs) fine-tuned to follow human instruction have exhibited significant capabilities in various English NLP tasks.

Few-Shot Learning Grammatical Error Correction +1

Paper
Add Code

TARJAMAT: Evaluation of Bard and ChatGPT on Machine Translation of Ten Arabic Varieties

no code implementations • 6 Aug 2023 • Karima Kadaoui, Samar M. Magdy, Abdul Waheed, Md Tawkat Islam Khondaker, Ahmed Oumar El-Shangiti, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed

Our evaluation covers diverse Arabic varieties such as Classical Arabic (CA), Modern Standard Arabic (MSA), and several country-level dialectal variants.

Dialogue Generation Machine Translation +2

Paper
Add Code

Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts

no code implementations • 8 Jun 2023 • Ganesh Jawahar, Haichuan Yang, Yunyang Xiong, Zechun Liu, Dilin Wang, Fei Sun, Meng Li, Aasish Pappu, Barlas Oguz, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Raghuraman Krishnamoorthi, Vikas Chandra

In addition, the proposed method achieves the SOTA performance in NAS for building fast machine translation models, yielding better latency-BLEU tradeoff compared to HAT, state-of-the-art NAS for MT.

Language Modelling Machine Translation +2

Paper
Add Code

N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition

no code implementations • 5 Jun 2023 • Bashar Talafha, Abdul Waheed, Muhammad Abdul-Mageed

Whisper, the recently developed multilingual weakly supervised model, is reported to perform well on multiple speech recognition benchmarks in both monolingual and multilingual settings.

Arabic Speech Recognition Benchmarking +2

Paper
Add Code

On the Robustness of Arabic Speech Dialect Identification

no code implementations • 1 Jun 2023 • Peter Sullivan, AbdelRahim Elmadany, Muhammad Abdul-Mageed

As these pipelines require application of ADI tools to potentially out-of-domain data, we aim to investigate how vulnerable the tools may be to this domain shift.

Dialect Identification Self-Supervised Learning +3

Paper
Add Code

GPTAraEval: A Comprehensive Evaluation of ChatGPT on Arabic NLP

no code implementations • 24 May 2023 • Md Tawkat Islam Khondaker, Abdul Waheed, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed

Although we further explore and confirm the utility of employing GPT-4 as a potential alternative for human evaluation, our work adds to a growing body of research underscoring the limitations of ChatGPT.

Natural Language Understanding

Paper
Add Code

Dolphin: A Challenging and Diverse Benchmark for Arabic NLG

no code implementations • 24 May 2023 • El Moatez Billah Nagoudi, AbdelRahim Elmadany, Ahmed El-Shangiti, Muhammad Abdul-Mageed

We present Dolphin, a novel benchmark that addresses the need for a natural language generation (NLG) evaluation framework dedicated to the wide collection of Arabic languages and varieties.

Dialogue Generation Machine Translation +3

Paper
Add Code

LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

1 code implementation • 27 Apr 2023 • Minghao Wu, Abdul Waheed, Chiyu Zhang, Muhammad Abdul-Mageed, Alham Fikri Aji

The results demonstrate that our proposed LaMini-LM models are comparable to competitive baselines, while being much smaller in size.

Ranked #15 on Word Sense Disambiguation on Words in Context

Common Sense Reasoning Coreference Resolution +5

798

Paper
Code

Zero-Shot Slot and Intent Detection in Low-Resource Languages

no code implementations • 26 Apr 2023 • Sang Yun Kwon, Gagan Bhatia, El Moatez Billah Nagoudi, Alcides Alcoba Inciarte, Muhammad Abdul-Mageed

Intent detection and slot filling are critical tasks in spoken and natural language understanding for task-oriented dialog systems.

Intent Detection Natural Language Understanding +2

Paper
Add Code

UBC-DLNLP at SemEval-2023 Task 12: Impact of Transfer Learning on African Sentiment Analysis

no code implementations • 21 Apr 2023 • Gagan Bhatia, Ife Adebara, AbdelRahim Elmadany, Muhammad Abdul-Mageed

We describe our contribution to the SemEVAl 2023 AfriSenti-SemEval shared task, where we tackle the task of sentiment analysis in 14 different African languages.

Sentiment Analysis Transfer Learning

Paper
Add Code

JASMINE: Arabic GPT Models for Few-Shot Learning

no code implementations • 21 Dec 2022 • El Moatez Billah Nagoudi, Muhammad Abdul-Mageed, AbdelRahim Elmadany, Alcides Alcoba Inciarte, Md Tawkat Islam Khondaker

Scholarship on generative pretraining (GPT) remains acutely Anglocentric, leaving serious gaps in our understanding of the whole class of autoregressive models.

Few-Shot Learning

Paper
Add Code

SERENGETI: Massively Multilingual Language Models for Africa

no code implementations • 21 Dec 2022 • Ife Adebara, AbdelRahim Elmadany, Muhammad Abdul-Mageed, Alcides Alcoba Inciarte

Multilingual pretrained language models (mPLMs) acquire valuable, generalizable linguistic information during pretraining and have advanced the state of the art on task-specific finetuning.

Language Modelling Natural Language Understanding

Paper
Add Code

ORCA: A Challenging Benchmark for Arabic Language Understanding

no code implementations • 21 Dec 2022 • AbdelRahim Elmadany, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed

Due to their crucial role in all NLP, several benchmarks have been proposed to evaluate pretrained language models.

Paper
Add Code

Cross-Platform and Cross-Domain Abusive Language Detection with Supervised Contrastive Learning

no code implementations • 11 Nov 2022 • Md Tawkat Islam Khondaker, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan

The prevalence of abusive language on different online platforms has been a major concern that raises the need for automated cross-platform abusive language detection.

Abusive Language Contrastive Learning +2

Paper
Add Code

A Benchmark Study of Contrastive Learning for Arabic Social Meaning

1 code implementation • 22 Oct 2022 • Md Tawkat Islam Khondaker, El Moatez Billah Nagoudi, AbdelRahim Elmadany, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan

Contrastive learning (CL) brought significant progress to various NLP tasks.

Contrastive Learning Dialect Identification +2

Paper
Code

AfroLID: A Neural Language Identification Tool for African Languages

1 code implementation • 21 Oct 2022 • Ife Adebara, AbdelRahim Elmadany, Muhammad Abdul-Mageed, Alcides Alcoba Inciarte

Problematically, most of the world's 7000+ languages today are not covered by LID technologies.

Language Identification

Paper
Code

NADI 2022: The Third Nuanced Arabic Dialect Identification Shared Task

1 code implementation • 18 Oct 2022 • Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Houda Bouamor, Nizar Habash

We describe findings of the third Nuanced Arabic Dialect Identification Shared Task (NADI 2022).

Dialect Identification Sentiment Analysis +1

Paper
Code

AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for Efficient Neural Machine Translation

1 code implementation • 14 Oct 2022 • Ganesh Jawahar, Subhabrata Mukherjee, Xiaodong Liu, Young Jin Kim, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Ahmed Hassan Awadallah, Sebastien Bubeck, Jianfeng Gao

Furthermore, existing MoE works do not consider computational constraints (e. g., FLOPs, latency) to guide their design.

Machine Translation Neural Architecture Search +1

Paper
Code

Small Character Models Match Large Word Models for Autocomplete Under Memory Constraints

no code implementations • 6 Oct 2022 • Ganesh Jawahar, Subhabrata Mukherjee, Debadeepta Dey, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Caio Cesar Teodoro Mendes, Gustavo Henrique de Rosa, Shital Shah

In this work, we study the more challenging open-domain setting consisting of low frequency user prompt patterns (or broad prompts, e. g., prompt about 93rd academy awards) and demonstrate the effectiveness of character-based language models.

Inductive Bias

Paper
Add Code

TURJUMAN: A Public Toolkit for Neural Arabic Machine Translation

1 code implementation • OSACT (LREC) 2022 • El Moatez Billah Nagoudi, AbdelRahim Elmadany, Muhammad Abdul-Mageed

We present TURJUMAN, a neural toolkit for translating from 20 languages into Modern Standard Arabic (MSA).

Machine Translation Semantic Similarity +2

Paper
Code

Improving Neural Machine Translation of Indigenous Languages with Multilingual Transfer Learning

no code implementations • 14 May 2022 • Wei-Rui Chen, Muhammad Abdul-Mageed

Machine translation (MT) involving Indigenous languages, including those possibly endangered, is challenging due to lack of sufficient parallel data.

Data Augmentation Machine Translation +2

Paper
Add Code

Decay No More: A Persistent Twitter Dataset for Learning Social Meaning

1 code implementation • 10 Apr 2022 • Chiyu Zhang, Muhammad Abdul-Mageed, El Moatez Billah Nagoudi

With the proliferation of social media, many studies resort to social media to construct datasets for developing social meaning understanding systems.

Paper
Code

Automatic Detection of Entity-Manipulated Text using Factual Knowledge

1 code implementation • ACL 2022 • Ganesh Jawahar, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan

We propose a neural network based detector that detects manipulated news articles by reasoning about the facts mentioned in the article.

Paper
Code

Towards Afrocentric NLP for African Languages: Where We Are and Where We Can Go

no code implementations • ACL 2022 • Ife Adebara, Muhammad Abdul-Mageed

Aligning with ACL 2022 special Theme on "Language Diversity: from Low Resource to Endangered Languages", we discuss the major linguistic and sociopolitical challenges facing development of NLP technologies for African languages.

Paper
Add Code

Contrastive Learning of Sociopragmatic Meaning in Social Media

1 code implementation • 15 Mar 2022 • Chiyu Zhang, Muhammad Abdul-Mageed, Ganesh Jawahar

Recent progress in representation and contrastive learning in NLP has not widely considered the class of \textit{sociopragmatic meaning} (i. e., meaning in interaction within different language communities).

Contrastive Learning

Paper
Code

Improving Automatic Speech Recognition for Non-Native English with Transfer Learning and Language Model Decoding

1 code implementation • 10 Feb 2022 • Peter Sullivan, Toshiko Shibano, Muhammad Abdul-Mageed

ASR systems designed for native English (L1) usually underperform on non-native English (L2).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning

no code implementations • 1 Oct 2021 • Toshiko Shibano, Xinyi Zhang, Mia Taige Li, Haejin Cho, Peter Sullivan, Muhammad Abdul-Mageed

To address the performance gap of English ASR models on L2 English speakers, we evaluate fine-tuning of pretrained wav2vec 2. 0 models (Baevski et al., 2020; Xu et al., 2021) on L2-ARCTIC, a non-native English speech corpus (Zhao et al., 2018) under different training settings.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

AraT5: Text-to-Text Transformers for Arabic Language Generation

1 code implementation • ACL 2022 • El Moatez Billah Nagoudi, AbdelRahim Elmadany, Muhammad Abdul-Mageed

For evaluation, we introduce a novel benchmark for ARabic language GENeration (ARGEN), covering seven important tasks.

Text Generation Transfer Learning

Paper
Code

Machine Translation of Low-Resource Indo-European Languages

no code implementations • WMT (EMNLP) 2021 • Wei-Rui Chen, Muhammad Abdul-Mageed

In this work, we investigate methods for the challenging task of translating between low-resource language pairs that exhibit some level of similarity.

Low-Resource Neural Machine Translation Transfer Learning +1

Paper
Add Code

Improving Similar Language Translation With Transfer Learning

no code implementations • WMT (EMNLP) 2021 • Ife Adebara, Muhammad Abdul-Mageed

We investigate transfer learning based on pre-trained neural machine translation models to translate between (low-resource) similar languages.

Machine Translation Transfer Learning +1

Paper
Add Code

ARBERT \& MARBERT: Deep Bidirectional Transformers for Arabic

no code implementations • ACL 2021 • Muhammad Abdul-Mageed, AbdelRahim Elmadany, El Moatez Billah Nagoudi

To evaluate our models, we also introduce ARLUE, a new benchmark for multi-dialectal Arabic language understanding evaluation.

XLM-R

Paper
Add Code

Improving Social Meaning Detection with Pragmatic Masking and Surrogate Fine-Tuning

1 code implementation • WASSA (ACL) 2022 • Chiyu Zhang, Muhammad Abdul-Mageed

We test our models on $15$ different Twitter datasets for social meaning detection.

Denoising Few-Shot Learning +1

Paper
Code

Investigating Code-Mixed Modern Standard Arabic-Egyptian to English Machine Translation

no code implementations • NAACL (CALCS) 2021 • El Moatez Billah Nagoudi, AbdelRahim Elmadany, Muhammad Abdul-Mageed

Our work is in the context of the Shared Task on Machine Translation in Code-Switching.

Language Modelling Machine Translation +2

Paper
Add Code

Exploring Text-to-Text Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing

no code implementations • NAACL (CALCS) 2021 • Ganesh Jawahar, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan

We describe models focused at the understudied problem of translating between monolingual and code-mixed language pairs.

Language Modelling Machine Translation +1

Paper
Add Code

AraStance: A Multi-Country and Multi-Domain Dataset of Arabic Stance Detection for Fact Checking

1 code implementation • NAACL (NLP4IF) 2021 • Tariq Alhindi, Amal Alabdulkarim, Ali Alshehri, Muhammad Abdul-Mageed, Preslav Nakov

With the continuing spread of misinformation and disinformation online, it is of increasing importance to develop combating mechanisms at scale in the form of automated systems that support multiple languages.

Fact Checking Misinformation +1

Paper
Code

IndT5: A Text-to-Text Transformer for 10 Indigenous Languages

no code implementations • NAACL (AmericasNLP) 2021 • El Moatez Billah Nagoudi, Wei-Rui Chen, Muhammad Abdul-Mageed, Hasan Cavusogl

Transformer language models have become fundamental components of natural language processing based pipelines.

Language Modelling Machine Translation +1

Paper
Add Code

Translating the Unseen? Yoruba-English MT in Low-Resource, Morphologically-Unmarked Settings

2 code implementations • 7 Mar 2021 • Ife Adebara, Muhammad Abdul-Mageed, Miikka Silfverberg

In this work, we perform fine-grained analysis on how an SMT system compares with two NMT systems (BiLSTM and Transformer) when translating bare nouns in Yor\`ub\'a into English.

Machine Translation NMT +1

Paper
Code

NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task

1 code implementation • EACL (WANLP) 2021 • Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Houda Bouamor, Nizar Habash

This Shared Task includes four subtasks: country-level Modern Standard Arabic (MSA) identification (Subtask 1. 1), country-level dialect identification (Subtask 1. 2), province-level MSA identification (Subtask 2. 1), and province-level sub-dialect identification (Subtask 2. 2).

Dialect Identification

Paper
Code

Self-Training Pre-Trained Language Models for Zero- and Few-Shot Multi-Dialectal Arabic Sequence Labeling

1 code implementation • EACL 2021 • Muhammad Khalifa, Muhammad Abdul-Mageed, Khaled Shaalan

We propose to self-train pre-trained language models in zero- and few-shot scenarios to improve performance on data-scarce varieties using only resources from data-rich ones.

Language Modelling NER +2

Paper
Code

ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic

2 code implementations • 27 Dec 2020 • Muhammad Abdul-Mageed, AbdelRahim Elmadany, El Moatez Billah Nagoudi

To evaluate our models, we also introduce ARLUE, a new benchmark for multi-dialectal Arabic language understanding evaluation.

XLM-R

Paper
Code

DiaLex: A Benchmark for Evaluating Multidialectal Arabic Word Embeddings

1 code implementation • EACL (WANLP) 2021 • Muhammad Abdul-Mageed, Shady Elbassuoni, Jad Doughman, AbdelRahim Elmadany, El Moatez Billah Nagoudi, Yorgo Zoughby, Ahmad Shaher, Iskander Gaba, Ahmed Helal, Mohammed El-Razzaz

We describe DiaLex, a benchmark for intrinsic evaluation of dialectal Arabic word embedding.

Word Embeddings

Paper
Code

Machine Generation and Detection of Arabic Manipulated and Fake News

1 code implementation • COLING (WANLP) 2020 • El Moatez Billah Nagoudi, AbdelRahim Elmadany, Muhammad Abdul-Mageed, Tariq Alhindi, Hasan Cavusoglu

Finally, we develop the first models for detecting manipulated Arabic news and achieve state-of-the-art results on Arabic fake news detection (macro F1=70. 06).

Fake News Detection POS

Paper
Code

Automatic Detection of Machine Generated Text: A Critical Survey

1 code implementation • COLING 2020 • Ganesh Jawahar, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan

Detectors that can distinguish text generated by TGM from human written text play a vital role in mitigating such misuse of TGMs.

Paper
Code

NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task

no code implementations • COLING (WANLP) 2020 • Muhammad Abdul-Mageed, Chiyu Zhang, Houda Bouamor, Nizar Habash

The data for the shared task covers a total of 100 provinces from 21 Arab countries and are collected from the Twitter domain.

Dialect Identification

Paper
Add Code

Toward Micro-Dialect Identification in Diaglossic and Code-Switched Environments

1 code implementation • EMNLP 2020 • Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Lyle Ungar

Although the prediction of dialects is an important language processing task, with a wide range of applications, existing work is largely limited to coarse-grained varieties.

Dialect Identification Language Modelling +1

Paper
Code

One Model to Pronounce Them All: Multilingual Grapheme-to-Phoneme Conversion With a Transformer Ensemble

no code implementations • WS 2020 • Kaili Vesik, Muhammad Abdul-Mageed, Miikka Silfverberg

The task of grapheme-to-phoneme (G2P) conversion is important for both speech recognition and synthesis.

speech-recognition Speech Recognition

Paper
Add Code

Growing Together: Modeling Human Language Learning With n-Best Multi-Checkpoint Machine Translation

no code implementations • WS 2020 • El Moatez Billah Nagoudi, Muhammad Abdul-Mageed, Hasan Cavusoglu

We describe our submission to the 2020 Duolingo Shared Task on Simultaneous Translation And Paraphrase for Language Education (STAPLE) (Mayhew et al., 2020).

Machine Translation Translation

Paper
Add Code

Leveraging Affective Bidirectional Transformers for Offensive Language Detection

no code implementations • LREC 2020 • AbdelRahim Elmadany, Chiyu Zhang, Muhammad Abdul-Mageed, Azadeh Hashemi

Social media are pervasive in our life, making it necessary to ensure safe online experiences by detecting and removing offensive and hate speech.

Data Augmentation Feature Engineering +1

Paper
Add Code

Understanding and Detecting Dangerous Speech in Social Media

no code implementations • LREC 2020 • Ali Alshehri, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed

Social media communication has become a significant part of daily activity in modern societies.

Paper
Add Code

Mega-COV: A Billion-Scale Dataset of 100+ Languages for COVID-19

1 code implementation • EACL 2021 • Muhammad Abdul-Mageed, AbdelRahim Elmadany, El Moatez Billah Nagoudi, Dinesh Pabbi, Kunal Verma, Rannie Lin

We describe Mega-COV, a billion-scale dataset from Twitter for studying COVID-19.

Misinformation

Paper
Code

AraNet: A Deep Learning Toolkit for Arabic Social Media

1 code implementation • LREC 2020 • Muhammad Abdul-Mageed, Chiyu Zhang, Azadeh Hashemi, El Moatez Billah Nagoudi

We describe AraNet, a collection of deep learning Arabic social media processing tools.

Feature Engineering

Paper
Code

Sentence-Level BERT and Multi-Task Learning of Age and Gender in Social Media

no code implementations • 2 Nov 2019 • Muhammad Abdul-Mageed, Chiyu Zhang, Arun Rajendran, AbdelRahim Elmadany, Michael Przystupa, Lyle Ungar

In this work we exploit a newly-created Arabic dataset with ground truth age and gender labels to learn these attributes both individually and in a multi-task setting at the sentence level.

Multi-Task Learning Sentence

Paper
Add Code

DiaNet: BERT and Hierarchical Attention Multi-Task Learning of Fine-Grained Dialect

no code implementations • 31 Oct 2019 • Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Arun Rajendran, Lyle Ungar

Prediction of language varieties and dialects is an important language processing task, with a wide range of applications.

Dialect Identification Multi-Task Learning

Paper
Add Code

BERT-Based Arabic Social Media Author Profiling

no code implementations • 9 Sep 2019 • Chiyu Zhang, Muhammad Abdul-Mageed

We report our models for detecting age, language variety, and gender from social media data in the context of the Arabic author profiling and deception detection shared task (APDA).

Deception Detection

Paper
Add Code

Multi-Task Bidirectional Transformer Representations for Irony Detection

no code implementations • 8 Sep 2019 • Chiyu Zhang, Muhammad Abdul-Mageed

Supervised deep learning requires large amounts of training data.

Feature Engineering

Paper
Add Code

No Army, No Navy: BERT Semi-Supervised Learning of Arabic Dialects

no code implementations • WS 2019 • Chiyu Zhang, Muhammad Abdul-Mageed

We present our deep leaning system submitted to MADAR shared task 2 focused on twitter user dialect identification.

Dialect Identification Task 2

Paper
Add Code

Neural Machine Translation of Low-Resource and Similar Languages with Backtranslation

no code implementations • WS 2019 • Michael Przystupa, Muhammad Abdul-Mageed

We investigate the utility of neural machine translation on three low-resource, similar language pairs: Spanish {--} Portuguese, Czech {--} Polish, and Hindi {--} Nepali.

Machine Translation Translation

Paper
Add Code

UBC-NLP at SemEval-2019 Task 6:Ensemble Learning of Offensive Content With Enhanced Training Data

no code implementations • 9 Jun 2019 • Arun Rajendran, Chiyu Zhang, Muhammad Abdul-Mageed

We examine learning offensive content on Twitter with limited, imbalanced data.

Ensemble Learning

Paper
Add Code

Happy Together: Learning and Understanding Appraisal From Natural Language

no code implementations • 9 Jun 2019 • Arun Rajendran, Chiyu Zhang, Muhammad Abdul-Mageed

In this paper, we explore various approaches for learning two types of appraisal components from happy language.

Machine Translation Translation

Paper
Add Code

UBC-NLP at SemEval-2019 Task 6: Ensemble Learning of Offensive Content With Enhanced Training Data

no code implementations • SEMEVAL 2019 • Arun Rajendran, Chiyu Zhang, Muhammad Abdul-Mageed

We examine learning offensive content on Twitter with limited, imbalanced data.

Ensemble Learning

Paper
Add Code

UBC-NLP at SemEval-2019 Task 4: Hyperpartisan News Detection With Attention-Based Bi-LSTMs

no code implementations • SEMEVAL 2019 • Chiyu Zhang, Arun Rajendran, Muhammad Abdul-Mageed

We present our deep learning models submitted to the SemEval-2019 Task 4 competition focused at Hyperpartisan News Detection.

Paper
Add Code

Deep Learning the EEG Manifold for Phonological Categorization from Active Thoughts

no code implementations • 8 Apr 2019 • Pramit Saha, Muhammad Abdul-Mageed, Sidney Fels

Speech-related Brain Computer Interfaces (BCI) aim primarily at finding an alternative vocal communication pathway for people with speaking disabilities.

Binary Classification EEG +2

Paper
Add Code

SPEAK YOUR MIND! Towards Imagined Speech Recognition With Hierarchical Deep Learning

no code implementations • 8 Apr 2019 • Pramit Saha, Muhammad Abdul-Mageed, Sidney Fels

Speech-related Brain Computer Interface (BCI) technologies provide effective vocal communication strategies for controlling devices through speech commands interpreted from brain signals.

Brain Computer Interface General Classification +3

Paper
Add Code

UBC-NLP at IEST 2018: Learning Implicit Emotion With an Ensemble of Language Models

no code implementations • WS 2018 • Hassan Alhuzali, Mohamed Elaraby, Muhammad Abdul-Mageed

We also offer an analysis of system performance and the impact of training data size on the task.

Language Modelling

Paper
Add Code

Deep Models for Arabic Dialect Identification on Benchmarked Data

no code implementations • COLING 2018 • Mohamed Elaraby, Muhammad Abdul-Mageed

We treat these two limitations:We (1) benchmark the data, and (2) empirically test6different deep learning methods on thetask, comparing peformance to several classical machine learning models under different condi-tions (i. e., both binary and multi-way classification).

Dialect Identification Machine Translation

Paper
Add Code

Enabling Deep Learning of Emotion With First-Person Seed Expressions

no code implementations • WS 2018 • Hassan Alhuzali, Muhammad Abdul-Mageed, Lyle Ungar

The computational treatment of emotion in natural language text remains relatively limited, and Arabic is no exception.

Emotion Recognition Machine Translation

Paper
Add Code

You Tweet What You Speak: A City-Level Dataset of Arabic Dialects

no code implementations • LREC 2018 • Muhammad Abdul-Mageed, Hassan Alhuzali, Mohamed Elaraby

Paper
Add Code

EmoNet: Fine-Grained Emotion Detection with Gated Recurrent Neural Networks

no code implementations • ACL 2017 • Muhammad Abdul-Mageed, Lyle Ungar

Accurate detection of emotion from natural language has applications ranging from building emotional chatbots to better understanding individuals and their lives.

Decision Making

Paper
Add Code

Not All Segments are Created Equal: Syntactically Motivated Sentiment Analysis in Lexical Space

no code implementations • WS 2017 • Muhammad Abdul-Mageed

Although there is by now a considerable amount of research on subjectivity and sentiment analysis on morphologically-rich languages, it is still unclear how lexical information can best be modeled in these languages.

Classification feature selection +3

Paper
Add Code

Does `well-being' translate on Twitter?

no code implementations • EMNLP 2016 • Laura Smith, Salvatore Giorgi, Rishi Solanki, Johannes Eichstaedt, H. Andrew Schwartz, Muhammad Abdul-Mageed, Anneke Buffone, Lyle Ungar

Machine Translation Sentiment Analysis

Paper
Add Code

SANA: A Large Scale Multi-Genre, Multi-Dialect Lexicon for Arabic Subjectivity and Sentiment Analysis

no code implementations • LREC 2014 • Muhammad Abdul-Mageed, Mona Diab

The computational treatment of subjectivity and sentiment in natural language is usually significantly improved by applying features exploiting lexical resources where entries are tagged with semantic orientation (e. g., positive, negative values).

Arabic Sentiment Analysis Machine Translation