Search Results for author: Hamdy Mubarak

Found 67 papers, 6 papers with code

Farasa: A New Fast and Accurate Arabic Word Segmenter

no code implementations LREC 2016 Kareem Darwish, Hamdy Mubarak

Meanwhile, Farasa is nearly one order of magnitude faster than QATARA and two orders of magnitude faster than MADAMIRA.

valid

Arabic to English Person Name Transliteration using Twitter

no code implementations LREC 2016 Hamdy Mubarak, Ahmed Abdelali

We present a novel approach for mining data from Twitter for the purpose of building transliteration resources and systems.

Retrieval Translation +1

The MGB-2 Challenge: Arabic Multi-Dialect Broadcast Media Recognition

no code implementations19 Sep 2016 Ahmed Ali, Peter Bell, James Glass, Yacine Messaoui, Hamdy Mubarak, Steve Renals, Yifan Zhang

For language modelling, we made available over 110M words crawled from Aljazeera Arabic website Aljazeera. net for a 10 year duration 2000-2011.

Acoustic Modelling Language Modelling +1

Arabic Diacritization: Stats, Rules, and Hacks

no code implementations WS 2017 Kareem Darwish, Hamdy Mubarak, Ahmed Abdelali

In this paper, we present a new and fast state-of-the-art Arabic diacritizer that guesses the diacritics of words and then their case endings.

Part-Of-Speech Tagging Transliteration +1

Arabic POS Tagging: Don't Abandon Feature Engineering Just Yet

no code implementations WS 2017 Kareem Darwish, Hamdy Mubarak, Ahmed Abdelali, Mohamed Eldesouki

However, we show that augmenting bi-LSTM sequence labeling with some of the features that we used for the SVM-Rank based tagger yields to further improvements.

Feature Engineering Named Entity Recognition (NER) +4

A Neural Architecture for Dialectal Arabic Segmentation

no code implementations WS 2017 Younes Samih, Mohammed Attia, Mohamed Eldesouki, Ahmed Abdelali, Hamdy Mubarak, Laura Kallmeyer, Kareem Darwish

The automated processing of Arabic Dialects is challenging due to the lack of spelling standards and to the scarcity of annotated data and resources in general.

Machine Translation Morphological Analysis +2

Abusive Language Detection on Arabic Social Media

no code implementations WS 2017 Hamdy Mubarak, Kareem Darwish, Walid Magdy

We expand the list of obscene words using this classification, and we report results on a newly created dataset of classified Arabic tweets (obscene, offensive, and clean).

Abusive Language General Classification

Learning from Relatives: Unified Dialectal Arabic Segmentation

no code implementations CONLL 2017 Younes Samih, Mohamed Eldesouki, Mohammed Attia, Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak, Laura Kallmeyer

Arabic dialects do not just share a common koin{\'e}, but there are shared pan-dialectal linguistic phenomena that allow computational models for dialects to learn from each other.

Dialect Identification Information Retrieval +2

Arabic Multi-Dialect Segmentation: bi-LSTM-CRF vs. SVM

2 code implementations19 Aug 2017 Mohamed Eldesouki, Younes Samih, Ahmed Abdelali, Mohammed Attia, Hamdy Mubarak, Kareem Darwish, Kallmeyer Laura

Arabic word segmentation is essential for a variety of NLP applications such as machine translation and information retrieval.

 Ranked #1 on Sentiment Analysis on DynaSent (using extra training data)

Domain Adaptation Information Retrieval +5

Build Fast and Accurate Lemmatization for Arabic

no code implementations LREC 2018 Hamdy Mubarak

In this paper we describe the complexity of building a lemmatizer for Arabic which has a rich and complex derivational morphology, and we discuss the need for a fast and accurate lammatization to enhance Arabic Information Retrieval (IR) results.

Information Retrieval Lemmatization +1

Diacritization of Maghrebi Arabic Sub-Dialects

no code implementations15 Oct 2018 Ahmed Abdelali, Mohammed Attia, Younes Samih, Kareem Darwish, Hamdy Mubarak

Diacritization process attempt to restore the short vowels in Arabic written text; which typically are omitted.

POS Tagging for Improving Code-Switching Identification in Arabic

no code implementations WS 2019 Mohammed Attia, Younes Samih, Ali Elkahky, Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish

When speakers code-switch between their native language and a second language or language variant, they follow a syntactic pattern where words and phrases from the embedded language are inserted into the matrix language.

POS POS Tagging

QC-GO Submission for MADAR Shared Task: Arabic Fine-Grained Dialect Identification

no code implementations WS 2019 Younes Samih, Hamdy Mubarak, Ahmed Abdelali, Mohammed Attia, Mohamed Eldesouki, Kareem Darwish

This paper describes the QC-GO team submission to the MADAR Shared Task Subtask 1 (travel domain dialect identification) and Subtask 2 (Twitter user location identification).

Dialect Identification

A System for Diacritizing Four Varieties of Arabic

no code implementations IJCNLP 2019 Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish, Mohamed Eldesouki, Younes Samih, Hassan Sajjad

Short vowels, aka diacritics, are more often omitted when writing different varieties of Arabic including Modern Standard Arabic (MSA), Classical Arabic (CA), and Dialectal Arabic (DA).

Feature Engineering

Arabic Diacritic Recovery Using a Feature-Rich biLSTM Model

no code implementations4 Feb 2020 Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak, Mohamed Eldesouki

Our model surpasses all previous state-of-the-art systems with a CW error rate (CWER) of 2. 86\% and a CE error rate (CEER) of 3. 7% for Modern Standard Arabic (MSA) and CWER of 2. 2% and CEER of 2. 5% for Classical Arabic (CA).

Feature Engineering

Arabic Offensive Language on Twitter: Analysis and Experiments

no code implementations EACL (WANLP) 2021 Hamdy Mubarak, Ammar Rashed, Kareem Darwish, Younes Samih, Ahmed Abdelali

Detecting offensive language on Twitter has many applications ranging from detecting/predicting bullying to measuring polarization.

Overview of OSACT4 Arabic Offensive Language Detection Shared Task

no code implementations LREC 2020 Hamdy Mubarak, Kareem Darwish, Walid Magdy, Tamer Elsayed, Hend Al-Khalifa

This paper provides an overview of the offensive language detection shared task at the 4th workshop on Open-Source Arabic Corpora and Processing Tools (OSACT4).

Arabic Dialect Identification in the Wild

no code implementations13 May 2020 Ahmed Abdelali, Hamdy Mubarak, Younes Samih, Sabit Hassan, Kareem Darwish

We present QADI, an automatically collected dataset of tweets belonging to a wide range of country-level Arabic dialects -covering 18 different countries in the Middle East and North Africa region.

Dialect Identification

Fighting the COVID-19 Infodemic in Social Media: A Holistic Perspective and a Call to Arms

1 code implementation15 Jul 2020 Firoj Alam, Fahim Dalvi, Shaden Shaar, Nadir Durrani, Hamdy Mubarak, Alex Nikolov, Giovanni Da San Martino, Ahmed Abdelali, Hassan Sajjad, Kareem Darwish, Preslav Nakov

With the outbreak of the COVID-19 pandemic, people turned to social media to read and to share timely information including statistics, warnings, advice, and inspirational stories.

Misinformation

ALT at SemEval-2020 Task 12: Arabic and English Offensive Language Identification in Social Media

no code implementations SEMEVAL 2020 Sabit Hassan, Younes Samih, Hamdy Mubarak, Ahmed Abdelali

This paper describes the systems submitted by the Arabic Language Technology group (ALT) at SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media.

Language Identification

Arabic Curriculum Analysis

no code implementations COLING 2020 Hamdy Mubarak, Shimaa Amer, Ahmed Abdelali, Kareem Darwish

Developing a platform that analyzes the content of curricula can help identify their shortcomings and whether they are tailored to specific desired outcomes.

ArCorona: Analyzing Arabic Tweets in the Early Days of Coronavirus (COVID-19) Pandemic

no code implementations EACL (Louhi) 2021 Hamdy Mubarak, Sabit Hassan

Over the past few months, there were huge numbers of circulating tweets and discussions about Coronavirus (COVID-19) in the Arab region.

Misinformation

Pre-Training BERT on Arabic Tweets: Practical Considerations

no code implementations21 Feb 2021 Ahmed Abdelali, Sabit Hassan, Hamdy Mubarak, Kareem Darwish, Younes Samih

The experiments highlight the centrality of data diversity and the efficacy of linguistically aware segmentation.

ASAD: Arabic Social media Analytics and unDerstanding

no code implementations EACL 2021 Sabit Hassan, Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish

This system demonstration paper describes ASAD: Arabic Social media Analysis and unDerstanding, a suite of seven individual modules that allows users to determine dialects, sentiment, news category, offensiveness, hate speech, adult content, and spam in Arabic tweets.

Automatic Expansion and Retargeting of Arabic Offensive Language Training

no code implementations18 Nov 2021 Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish, Younes Samih

Rampant use of offensive language on social media led to recent efforts on automatic identification of such language.

Emojis as Anchors to Detect Arabic Offensive Language and Hate Speech

no code implementations18 Jan 2022 Hamdy Mubarak, Sabit Hassan, Shammur Absar Chowdhury

We evaluate our models on external datasets - a Twitter dataset collected using a completely different method, and a multi-platform dataset containing comments from Twitter, YouTube and Facebook, for assessing generalization capability.

Cultural Vocal Bursts Intensity Prediction

ArabGend: Gender Analysis and Inference on Arabic Twitter

no code implementations COLING (WNUT) 2022 Hamdy Mubarak, Shammur Absar Chowdhury, Firoj Alam

Gender analysis of Twitter can reveal important socio-cultural differences between male and female users.

NatiQ: An End-to-end Text-to-Speech System for Arabic

no code implementations15 Jun 2022 Ahmed Abdelali, Nadir Durrani, Cenk Demiroglu, Fahim Dalvi, Hamdy Mubarak, Kareem Darwish

We concatenated Tacotron1 with the WaveRNN vocoder, Tacotron2 with the WaveGlow vocoder and ESPnet transformer with the parallel wavegan vocoder to synthesize waveforms from the spectrograms.

SpeechBlender: Speech Augmentation Framework for Mispronunciation Data Generation

no code implementations2 Nov 2022 Yassine El Kheir, Shammur Absar Chowdhury, Ahmed Ali, Hamdy Mubarak, Shazia Afzal

Our proposed technique achieves state-of-the-art results, with Speechocean762, on ASR dependent mispronunciation detection models at phoneme level, with a 2. 0% gain in Pearson Correlation Coefficient (PCC) compared to the previous state-of-the-art [1].

Data Augmentation Multi-Task Learning +1

Overview of the WANLP 2022 Shared Task on Propaganda Detection in Arabic

no code implementations18 Nov 2022 Firoj Alam, Hamdy Mubarak, Wajdi Zaghouani, Giovanni Da San Martino, Preslav Nakov

Thus, there has been a lot of recent research on automatic detection of propaganda techniques in text as well as in memes.

Propaganda detection

Detecting and Reasoning of Deleted Tweets before they are Posted

no code implementations5 May 2023 Hamdy Mubarak, Samir Abdaljalil, Azza Nassar, Firoj Alam

Social media platforms empower us in several ways, from information dissemination to consumption.

QVoice: Arabic Speech Pronunciation Learning Application

no code implementations9 May 2023 Yassine El Kheir, Fouad Khnaisser, Shammur Absar Chowdhury, Hamdy Mubarak, Shazia Afzal, Ahmed Ali

This paper introduces a novel Arabic pronunciation learning application QVoice, powered with end-to-end mispronunciation detection and feedback generator module.

ArAIEval Shared Task: Persuasion Techniques and Disinformation Detection in Arabic Text

no code implementations6 Nov 2023 Maram Hasanain, Firoj Alam, Hamdy Mubarak, Samir Abdaljalil, Wajdi Zaghouani, Preslav Nakov, Giovanni Da San Martino, Abed Alhakim Freihat

We present an overview of the ArAIEval shared task, organized as part of the first ArabicNLP 2023 conference co-located with EMNLP 2023.

Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification

no code implementations7 Mar 2024 Ekaterina Fadeeva, Aleksandr Rubashevskii, Artem Shelmanov, Sergey Petrakov, Haonan Li, Hamdy Mubarak, Evgenii Tsymbalov, Gleb Kuzmin, Alexander Panchenko, Timothy Baldwin, Preslav Nakov, Maxim Panov

Uncertainty scores leverage information encapsulated in the output of a neural network or its layers to detect unreliable predictions, and we show that they can be used to fact-check the atomic claims in the LLM output.

Fact Checking Hallucination +1

Overview of OSACT5 Shared Task on Arabic Offensive Language and Hate Speech Detection

no code implementations OSACT (LREC) 2022 Hamdy Mubarak, Hend Al-Khalifa, Abdulmohsen Al-Thubaity

This paper provides an overview of the shard task on detecting offensive language, hate speech, and fine-grained hate speech at the fifth workshop on Open-Source Arabic Corpora and Processing Tools (OSACT5).

Hate Speech Detection

UL2C: Mapping User Locations to Countries on Arabic Twitter

no code implementations EACL (WANLP) 2021 Hamdy Mubarak, Sabit Hassan

Mapping user locations to countries can be useful for many applications such as dialect identification, author profiling, recommendation system, etc.

Dialect Identification

Adult Content Detection on Arabic Twitter: Analysis and Experiments

no code implementations EACL (WANLP) 2021 Hamdy Mubarak, Sabit Hassan, Ahmed Abdelali

With Twitter being one of the most popular social media platforms in the Arab region, it is not surprising to find accounts that post adult content in Arabic tweets; despite the fact that these platforms dissuade users from such content.

QADI: Arabic Dialect Identification in the Wild

no code implementations EACL (WANLP) 2021 Ahmed Abdelali, Hamdy Mubarak, Younes Samih, Sabit Hassan, Kareem Darwish

For extrinsic evaluation, we are able to build effective country level dialect identification on tweets with a macro-averaged F1-score of 60. 6% across 18 classes.

Dialect Identification

Cannot find the paper you are looking for? You can Submit a new open access paper.