Search Results for author: Muhammad Abdul-Mageed

Found 95 papers, 26 papers with code

Linguistically-Motivated Yorùbá-English Machine Translation

no code implementations COLING 2022 Ife Adebara, Muhammad Abdul-Mageed, Miikka Silfverberg

In this work, we perform fine-grained analysis on how an SMT system compares with two NMT systems (BiLSTM and Transformer) when translating bare nouns in Yorùbá into English.

Machine Translation NMT +1

Interplay of Machine Translation, Diacritics, and Diacritization

no code implementations9 Apr 2024 Wei-Rui Chen, Ife Adebara, Muhammad Abdul-Mageed

We examine these two questions in both high-resource (HR) and low-resource (LR) settings across 55 different languages (36 African languages and 19 European languages).

Machine Translation Multi-Task Learning +1

Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks

1 code implementation1 Mar 2024 Fakhraddin Alwajih, El Moatez Billah Nagoudi, Gagan Bhatia, Abdelrahman Mohamed, Muhammad Abdul-Mageed

Multimodal large language models (MLLMs) have proven effective in a wide range of tasks requiring complex reasoning and linguistic comprehension.

Visual Reasoning

FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models

no code implementations16 Feb 2024 Gagan Bhatia, El Moatez Billah Nagoudi, Hasan Cavusoglu, Muhammad Abdul-Mageed

We introduce FinTral, a suite of state-of-the-art multimodal large language models (LLMs) built upon the Mistral-7b model and tailored for financial analysis.

Decision Making Retrieval

Cheetah: Natural Language Generation for 517 African Languages

no code implementations2 Jan 2024 Ife Adebara, AbdelRahim Elmadany, Muhammad Abdul-Mageed

The findings of this study contribute to advancing NLP research in low-resource settings, enabling greater accessibility and inclusion for African languages in a rapidly expanding digital landscape.

Language Modelling Text Generation

Beyond English: Evaluating LLMs for Arabic Grammatical Error Correction

no code implementations13 Dec 2023 Sang Yun Kwon, Gagan Bhatia, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed

Our best model achieves a new SOTA on Arabic GEC, with $73. 29$ and $73. 26$ F$_{1}$ on the 2014 and 2015 QALB datasets, respectively, compared to peer-reviewed published baselines.

Few-Shot Learning Grammatical Error Correction +1

Arabic Fine-Grained Entity Recognition

no code implementations26 Oct 2023 Haneen Liqreina, Mustafa Jarrar, Mohammed Khalilia, Ahmed Oumar El-Shangiti, Muhammad Abdul-Mageed

To compute the baselines of WojoodF ine, we fine-tune three pre-trained Arabic BERT encoders in three settings: flat NER, nested NER and nested NER with subtypes and achieved F1 score of 0. 920, 0. 866, and 0. 885, respectively.

NER

LLM Performance Predictors are good initializers for Architecture Search

no code implementations25 Oct 2023 Ganesh Jawahar, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Dujian Ding

We show that HS-NAS performs very similar to SOTA NAS across benchmarks, reduces search hours by 50% roughly, and in some cases, improves latency, GFLOPs, and model size.

Machine Translation Neural Architecture Search

Octopus: A Multitask Model and Toolkit for Arabic Natural Language Generation

no code implementations24 Oct 2023 AbdelRahim Elmadany, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed

While many researchers have proposed models and solutions for individual problems, there is an acute shortage of a comprehensive Arabic natural language generation toolkit that is capable of handling a wide range of tasks.

Text Generation

The Skipped Beat: A Study of Sociopragmatic Understanding in LLMs for 64 Languages

1 code implementation23 Oct 2023 Chiyu Zhang, Khai Duy Doan, Qisheng Liao, Muhammad Abdul-Mageed

We evaluate the performance of various multilingual pretrained language models (e. g., mT5) and instruction-tuned LLMs (e. g., BLOOMZ, ChatGPT) on SPARROW through fine-tuning, zero-shot, and/or few-shot learning.

Emotion Recognition Few-Shot Learning

ChatGPT for Arabic Grammatical Error Correction

no code implementations8 Aug 2023 Sang Yun Kwon, Gagan Bhatia, El Moatez Billah Nagoud, Muhammad Abdul-Mageed

Recently, large language models (LLMs) fine-tuned to follow human instruction have exhibited significant capabilities in various English NLP tasks.

Few-Shot Learning Grammatical Error Correction +1

N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition

no code implementations5 Jun 2023 Bashar Talafha, Abdul Waheed, Muhammad Abdul-Mageed

Whisper, the recently developed multilingual weakly supervised model, is reported to perform well on multiple speech recognition benchmarks in both monolingual and multilingual settings.

Arabic Speech Recognition Benchmarking +2

On the Robustness of Arabic Speech Dialect Identification

no code implementations1 Jun 2023 Peter Sullivan, AbdelRahim Elmadany, Muhammad Abdul-Mageed

As these pipelines require application of ADI tools to potentially out-of-domain data, we aim to investigate how vulnerable the tools may be to this domain shift.

Dialect Identification Self-Supervised Learning +3

GPTAraEval: A Comprehensive Evaluation of ChatGPT on Arabic NLP

no code implementations24 May 2023 Md Tawkat Islam Khondaker, Abdul Waheed, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed

Although we further explore and confirm the utility of employing GPT-4 as a potential alternative for human evaluation, our work adds to a growing body of research underscoring the limitations of ChatGPT.

Natural Language Understanding

Dolphin: A Challenging and Diverse Benchmark for Arabic NLG

no code implementations24 May 2023 El Moatez Billah Nagoudi, AbdelRahim Elmadany, Ahmed El-Shangiti, Muhammad Abdul-Mageed

We present Dolphin, a novel benchmark that addresses the need for a natural language generation (NLG) evaluation framework dedicated to the wide collection of Arabic languages and varieties.

Dialogue Generation Machine Translation +3

UBC-DLNLP at SemEval-2023 Task 12: Impact of Transfer Learning on African Sentiment Analysis

no code implementations21 Apr 2023 Gagan Bhatia, Ife Adebara, AbdelRahim Elmadany, Muhammad Abdul-Mageed

We describe our contribution to the SemEVAl 2023 AfriSenti-SemEval shared task, where we tackle the task of sentiment analysis in 14 different African languages.

Sentiment Analysis Transfer Learning

JASMINE: Arabic GPT Models for Few-Shot Learning

no code implementations21 Dec 2022 El Moatez Billah Nagoudi, Muhammad Abdul-Mageed, AbdelRahim Elmadany, Alcides Alcoba Inciarte, Md Tawkat Islam Khondaker

Scholarship on generative pretraining (GPT) remains acutely Anglocentric, leaving serious gaps in our understanding of the whole class of autoregressive models.

Few-Shot Learning

SERENGETI: Massively Multilingual Language Models for Africa

no code implementations21 Dec 2022 Ife Adebara, AbdelRahim Elmadany, Muhammad Abdul-Mageed, Alcides Alcoba Inciarte

Multilingual pretrained language models (mPLMs) acquire valuable, generalizable linguistic information during pretraining and have advanced the state of the art on task-specific finetuning.

Language Modelling Natural Language Understanding

ORCA: A Challenging Benchmark for Arabic Language Understanding

no code implementations21 Dec 2022 AbdelRahim Elmadany, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed

Due to their crucial role in all NLP, several benchmarks have been proposed to evaluate pretrained language models.

Cross-Platform and Cross-Domain Abusive Language Detection with Supervised Contrastive Learning

no code implementations11 Nov 2022 Md Tawkat Islam Khondaker, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan

The prevalence of abusive language on different online platforms has been a major concern that raises the need for automated cross-platform abusive language detection.

Abusive Language Contrastive Learning +2

Small Character Models Match Large Word Models for Autocomplete Under Memory Constraints

no code implementations6 Oct 2022 Ganesh Jawahar, Subhabrata Mukherjee, Debadeepta Dey, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Caio Cesar Teodoro Mendes, Gustavo Henrique de Rosa, Shital Shah

In this work, we study the more challenging open-domain setting consisting of low frequency user prompt patterns (or broad prompts, e. g., prompt about 93rd academy awards) and demonstrate the effectiveness of character-based language models.

Inductive Bias

Improving Neural Machine Translation of Indigenous Languages with Multilingual Transfer Learning

no code implementations14 May 2022 Wei-Rui Chen, Muhammad Abdul-Mageed

Machine translation (MT) involving Indigenous languages, including those possibly endangered, is challenging due to lack of sufficient parallel data.

Data Augmentation Machine Translation +2

Decay No More: A Persistent Twitter Dataset for Learning Social Meaning

1 code implementation10 Apr 2022 Chiyu Zhang, Muhammad Abdul-Mageed, El Moatez Billah Nagoudi

With the proliferation of social media, many studies resort to social media to construct datasets for developing social meaning understanding systems.

Automatic Detection of Entity-Manipulated Text using Factual Knowledge

1 code implementation ACL 2022 Ganesh Jawahar, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan

We propose a neural network based detector that detects manipulated news articles by reasoning about the facts mentioned in the article.

Towards Afrocentric NLP for African Languages: Where We Are and Where We Can Go

no code implementations ACL 2022 Ife Adebara, Muhammad Abdul-Mageed

Aligning with ACL 2022 special Theme on "Language Diversity: from Low Resource to Endangered Languages", we discuss the major linguistic and sociopolitical challenges facing development of NLP technologies for African languages.

Contrastive Learning of Sociopragmatic Meaning in Social Media

1 code implementation15 Mar 2022 Chiyu Zhang, Muhammad Abdul-Mageed, Ganesh Jawahar

Recent progress in representation and contrastive learning in NLP has not widely considered the class of \textit{sociopragmatic meaning} (i. e., meaning in interaction within different language communities).

Contrastive Learning

Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning

no code implementations1 Oct 2021 Toshiko Shibano, Xinyi Zhang, Mia Taige Li, Haejin Cho, Peter Sullivan, Muhammad Abdul-Mageed

To address the performance gap of English ASR models on L2 English speakers, we evaluate fine-tuning of pretrained wav2vec 2. 0 models (Baevski et al., 2020; Xu et al., 2021) on L2-ARCTIC, a non-native English speech corpus (Zhao et al., 2018) under different training settings.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Machine Translation of Low-Resource Indo-European Languages

no code implementations WMT (EMNLP) 2021 Wei-Rui Chen, Muhammad Abdul-Mageed

In this work, we investigate methods for the challenging task of translating between low-resource language pairs that exhibit some level of similarity.

Low-Resource Neural Machine Translation Transfer Learning +1

Improving Similar Language Translation With Transfer Learning

no code implementations WMT (EMNLP) 2021 Ife Adebara, Muhammad Abdul-Mageed

We investigate transfer learning based on pre-trained neural machine translation models to translate between (low-resource) similar languages.

Machine Translation Transfer Learning +1

ARBERT \& MARBERT: Deep Bidirectional Transformers for Arabic

no code implementations ACL 2021 Muhammad Abdul-Mageed, AbdelRahim Elmadany, El Moatez Billah Nagoudi

To evaluate our models, we also introduce ARLUE, a new benchmark for multi-dialectal Arabic language understanding evaluation.

XLM-R

AraStance: A Multi-Country and Multi-Domain Dataset of Arabic Stance Detection for Fact Checking

1 code implementation NAACL (NLP4IF) 2021 Tariq Alhindi, Amal Alabdulkarim, Ali Alshehri, Muhammad Abdul-Mageed, Preslav Nakov

With the continuing spread of misinformation and disinformation online, it is of increasing importance to develop combating mechanisms at scale in the form of automated systems that support multiple languages.

Fact Checking Misinformation +1

Translating the Unseen? Yoruba-English MT in Low-Resource, Morphologically-Unmarked Settings

2 code implementations7 Mar 2021 Ife Adebara, Muhammad Abdul-Mageed, Miikka Silfverberg

In this work, we perform fine-grained analysis on how an SMT system compares with two NMT systems (BiLSTM and Transformer) when translating bare nouns in Yor\`ub\'a into English.

Machine Translation NMT +1

NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task

1 code implementation EACL (WANLP) 2021 Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Houda Bouamor, Nizar Habash

This Shared Task includes four subtasks: country-level Modern Standard Arabic (MSA) identification (Subtask 1. 1), country-level dialect identification (Subtask 1. 2), province-level MSA identification (Subtask 2. 1), and province-level sub-dialect identification (Subtask 2. 2).

Dialect Identification

Self-Training Pre-Trained Language Models for Zero- and Few-Shot Multi-Dialectal Arabic Sequence Labeling

1 code implementation EACL 2021 Muhammad Khalifa, Muhammad Abdul-Mageed, Khaled Shaalan

We propose to self-train pre-trained language models in zero- and few-shot scenarios to improve performance on data-scarce varieties using only resources from data-rich ones.

Language Modelling NER +2

ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic

2 code implementations27 Dec 2020 Muhammad Abdul-Mageed, AbdelRahim Elmadany, El Moatez Billah Nagoudi

To evaluate our models, we also introduce ARLUE, a new benchmark for multi-dialectal Arabic language understanding evaluation.

XLM-R

Machine Generation and Detection of Arabic Manipulated and Fake News

1 code implementation COLING (WANLP) 2020 El Moatez Billah Nagoudi, AbdelRahim Elmadany, Muhammad Abdul-Mageed, Tariq Alhindi, Hasan Cavusoglu

Finally, we develop the first models for detecting manipulated Arabic news and achieve state-of-the-art results on Arabic fake news detection (macro F1=70. 06).

Fake News Detection POS

Automatic Detection of Machine Generated Text: A Critical Survey

1 code implementation COLING 2020 Ganesh Jawahar, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan

Detectors that can distinguish text generated by TGM from human written text play a vital role in mitigating such misuse of TGMs.

NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task

no code implementations COLING (WANLP) 2020 Muhammad Abdul-Mageed, Chiyu Zhang, Houda Bouamor, Nizar Habash

The data for the shared task covers a total of 100 provinces from 21 Arab countries and are collected from the Twitter domain.

Dialect Identification

Toward Micro-Dialect Identification in Diaglossic and Code-Switched Environments

1 code implementation EMNLP 2020 Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Lyle Ungar

Although the prediction of dialects is an important language processing task, with a wide range of applications, existing work is largely limited to coarse-grained varieties.

Dialect Identification Language Modelling +1

Growing Together: Modeling Human Language Learning With n-Best Multi-Checkpoint Machine Translation

no code implementations WS 2020 El Moatez Billah Nagoudi, Muhammad Abdul-Mageed, Hasan Cavusoglu

We describe our submission to the 2020 Duolingo Shared Task on Simultaneous Translation And Paraphrase for Language Education (STAPLE) (Mayhew et al., 2020).

Machine Translation Translation

Leveraging Affective Bidirectional Transformers for Offensive Language Detection

no code implementations LREC 2020 AbdelRahim Elmadany, Chiyu Zhang, Muhammad Abdul-Mageed, Azadeh Hashemi

Social media are pervasive in our life, making it necessary to ensure safe online experiences by detecting and removing offensive and hate speech.

Data Augmentation Feature Engineering +1

Sentence-Level BERT and Multi-Task Learning of Age and Gender in Social Media

no code implementations2 Nov 2019 Muhammad Abdul-Mageed, Chiyu Zhang, Arun Rajendran, AbdelRahim Elmadany, Michael Przystupa, Lyle Ungar

In this work we exploit a newly-created Arabic dataset with ground truth age and gender labels to learn these attributes both individually and in a multi-task setting at the sentence level.

Multi-Task Learning Sentence

BERT-Based Arabic Social Media Author Profiling

no code implementations9 Sep 2019 Chiyu Zhang, Muhammad Abdul-Mageed

We report our models for detecting age, language variety, and gender from social media data in the context of the Arabic author profiling and deception detection shared task (APDA).

Deception Detection

No Army, No Navy: BERT Semi-Supervised Learning of Arabic Dialects

no code implementations WS 2019 Chiyu Zhang, Muhammad Abdul-Mageed

We present our deep leaning system submitted to MADAR shared task 2 focused on twitter user dialect identification.

Dialect Identification Task 2

Neural Machine Translation of Low-Resource and Similar Languages with Backtranslation

no code implementations WS 2019 Michael Przystupa, Muhammad Abdul-Mageed

We investigate the utility of neural machine translation on three low-resource, similar language pairs: Spanish {--} Portuguese, Czech {--} Polish, and Hindi {--} Nepali.

Machine Translation Translation

Happy Together: Learning and Understanding Appraisal From Natural Language

no code implementations9 Jun 2019 Arun Rajendran, Chiyu Zhang, Muhammad Abdul-Mageed

In this paper, we explore various approaches for learning two types of appraisal components from happy language.

Machine Translation Translation

UBC-NLP at SemEval-2019 Task 4: Hyperpartisan News Detection With Attention-Based Bi-LSTMs

no code implementations SEMEVAL 2019 Chiyu Zhang, Arun Rajendran, Muhammad Abdul-Mageed

We present our deep learning models submitted to the SemEval-2019 Task 4 competition focused at Hyperpartisan News Detection.

Deep Learning the EEG Manifold for Phonological Categorization from Active Thoughts

no code implementations8 Apr 2019 Pramit Saha, Muhammad Abdul-Mageed, Sidney Fels

Speech-related Brain Computer Interfaces (BCI) aim primarily at finding an alternative vocal communication pathway for people with speaking disabilities.

Binary Classification EEG +2

SPEAK YOUR MIND! Towards Imagined Speech Recognition With Hierarchical Deep Learning

no code implementations8 Apr 2019 Pramit Saha, Muhammad Abdul-Mageed, Sidney Fels

Speech-related Brain Computer Interface (BCI) technologies provide effective vocal communication strategies for controlling devices through speech commands interpreted from brain signals.

Brain Computer Interface General Classification +3

Deep Models for Arabic Dialect Identification on Benchmarked Data

no code implementations COLING 2018 Mohamed Elaraby, Muhammad Abdul-Mageed

We treat these two limitations:We (1) benchmark the data, and (2) empirically test6different deep learning methods on thetask, comparing peformance to several classical machine learning models under different condi-tions (i. e., both binary and multi-way classification).

Dialect Identification Machine Translation

EmoNet: Fine-Grained Emotion Detection with Gated Recurrent Neural Networks

no code implementations ACL 2017 Muhammad Abdul-Mageed, Lyle Ungar

Accurate detection of emotion from natural language has applications ranging from building emotional chatbots to better understanding individuals and their lives.

Decision Making

Not All Segments are Created Equal: Syntactically Motivated Sentiment Analysis in Lexical Space

no code implementations WS 2017 Muhammad Abdul-Mageed

Although there is by now a considerable amount of research on subjectivity and sentiment analysis on morphologically-rich languages, it is still unclear how lexical information can best be modeled in these languages.

Classification feature selection +3

SANA: A Large Scale Multi-Genre, Multi-Dialect Lexicon for Arabic Subjectivity and Sentiment Analysis

no code implementations LREC 2014 Muhammad Abdul-Mageed, Mona Diab

The computational treatment of subjectivity and sentiment in natural language is usually significantly improved by applying features exploiting lexical resources where entries are tagged with semantic orientation (e. g., positive, negative values).

Arabic Sentiment Analysis Machine Translation

AWATIF: A Multi-Genre Corpus for Modern Standard Arabic Subjectivity and Sentiment Analysis

no code implementations LREC 2012 Muhammad Abdul-Mageed, Mona Diab

We present AWATIF, a multi-genre corpus of Modern Standard Arabic (MSA) labeled for subjectivity and sentiment analysis (SSA) at the sentence level.

Opinion Mining Sentence +1

Cannot find the paper you are looking for? You can Submit a new open access paper.