Search Results for author: Kareem Darwish

Found 59 papers, 7 papers with code

Embeddings-Based Clustering for Target Specific Stances: The Case of a Polarized Turkey

1 code implementation19 May 2020 Ammar Rashed, Mucahid Kutlu, Kareem Darwish, Tamer Elsayed, Cansin Bayrak

On June 24, 2018, Turkey conducted a highly consequential election in which the Turkish people elected their president and parliament in the first election under a new presidential system.

Clustering Sentence +2

Arabic Multi-Dialect Segmentation: bi-LSTM-CRF vs. SVM

2 code implementations19 Aug 2017 Mohamed Eldesouki, Younes Samih, Ahmed Abdelali, Mohammed Attia, Hamdy Mubarak, Kareem Darwish, Kallmeyer Laura

Arabic word segmentation is essential for a variety of NLP applications such as machine translation and information retrieval.

 Ranked #1 on Sentiment Analysis on DynaSent (using extra training data)

Domain Adaptation Information Retrieval +5

Unsupervised User Stance Detection on Twitter

2 code implementations3 Apr 2019 Kareem Darwish, Peter Stefanov, Michaël J. Aupetit, Preslav Nakov

We experiment with different combinations of user similarity features, dataset sizes, dimensionality reduction methods, and clustering algorithms to ascertain the most effective and most computationally efficient combinations across three different datasets (in English and Turkish).

Social and Information Networks 62P25, 91D30

Fighting the COVID-19 Infodemic in Social Media: A Holistic Perspective and a Call to Arms

1 code implementation15 Jul 2020 Firoj Alam, Fahim Dalvi, Shaden Shaar, Nadir Durrani, Hamdy Mubarak, Alex Nikolov, Giovanni Da San Martino, Ahmed Abdelali, Hassan Sajjad, Kareem Darwish, Preslav Nakov

With the outbreak of the COVID-19 pandemic, people turned to social media to read and to share timely information including statistics, warnings, advice, and inspirational stories.

Misinformation

Arabizi Detection and Conversion to Arabic

no code implementations WS 2014 Kareem Darwish

In this paper we address the problems of: identifying Arabizi in text and converting it to Arabic characters.

Language Modelling Transliteration

Diacritization of Maghrebi Arabic Sub-Dialects

no code implementations15 Oct 2018 Ahmed Abdelali, Mohammed Attia, Younes Samih, Kareem Darwish, Hamdy Mubarak

Diacritization process attempt to restore the short vowels in Arabic written text; which typically are omitted.

Learning from Relatives: Unified Dialectal Arabic Segmentation

no code implementations CONLL 2017 Younes Samih, Mohamed Eldesouki, Mohammed Attia, Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak, Laura Kallmeyer

Arabic dialects do not just share a common koin{\'e}, but there are shared pan-dialectal linguistic phenomena that allow computational models for dialects to learn from each other.

Dialect Identification Information Retrieval +2

Arabic Diacritization: Stats, Rules, and Hacks

no code implementations WS 2017 Kareem Darwish, Hamdy Mubarak, Ahmed Abdelali

In this paper, we present a new and fast state-of-the-art Arabic diacritizer that guesses the diacritics of words and then their case endings.

Part-Of-Speech Tagging Transliteration +1

A Neural Architecture for Dialectal Arabic Segmentation

no code implementations WS 2017 Younes Samih, Mohammed Attia, Mohamed Eldesouki, Ahmed Abdelali, Hamdy Mubarak, Laura Kallmeyer, Kareem Darwish

The automated processing of Arabic Dialects is challenging due to the lack of spelling standards and to the scarcity of annotated data and resources in general.

Machine Translation Morphological Analysis +2

Arabic POS Tagging: Don't Abandon Feature Engineering Just Yet

no code implementations WS 2017 Kareem Darwish, Hamdy Mubarak, Ahmed Abdelali, Mohamed Eldesouki

However, we show that augmenting bi-LSTM sequence labeling with some of the features that we used for the SVM-Rank based tagger yields to further improvements.

Feature Engineering Named Entity Recognition (NER) +4

Abusive Language Detection on Arabic Social Media

no code implementations WS 2017 Hamdy Mubarak, Kareem Darwish, Walid Magdy

We expand the list of obscene words using this classification, and we report results on a newly created dataset of classified Arabic tweets (obscene, offensive, and clean).

Abusive Language General Classification

Simple Effective Microblog Named Entity Recognition: Arabic as an Example

no code implementations LREC 2014 Kareem Darwish, Wei Gao

Despite many recent papers on Arabic Named Entity Recognition (NER) in the news domain, little work has been done on microblog NER.

Domain Adaptation Information Retrieval +3

Farasa: A New Fast and Accurate Arabic Word Segmenter

no code implementations LREC 2016 Kareem Darwish, Hamdy Mubarak

Meanwhile, Farasa is nearly one order of magnitude faster than QATARA and two orders of magnitude faster than MADAMIRA.

valid

Predicting the Topical Stance of Media and Popular Twitter Users

no code implementations2 Jul 2019 Peter Stefanov, Kareem Darwish, Atanas Atanasov, Preslav Nakov

Discovering the stances of media outlets and influential people on current, debatable topics is important for social statisticians and policy makers.

POS Tagging for Improving Code-Switching Identification in Arabic

no code implementations WS 2019 Mohammed Attia, Younes Samih, Ali Elkahky, Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish

When speakers code-switch between their native language and a second language or language variant, they follow a syntactic pattern where words and phrases from the embedded language are inserted into the matrix language.

POS POS Tagging

QC-GO Submission for MADAR Shared Task: Arabic Fine-Grained Dialect Identification

no code implementations WS 2019 Younes Samih, Hamdy Mubarak, Ahmed Abdelali, Mohammed Attia, Mohamed Eldesouki, Kareem Darwish

This paper describes the QC-GO team submission to the MADAR Shared Task Subtask 1 (travel domain dialect identification) and Subtask 2 (Twitter user location identification).

Dialect Identification

A System for Diacritizing Four Varieties of Arabic

no code implementations IJCNLP 2019 Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish, Mohamed Eldesouki, Younes Samih, Hassan Sajjad

Short vowels, aka diacritics, are more often omitted when writing different varieties of Arabic including Modern Standard Arabic (MSA), Classical Arabic (CA), and Dialectal Arabic (DA).

Feature Engineering

Arabic Diacritic Recovery Using a Feature-Rich biLSTM Model

no code implementations4 Feb 2020 Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak, Mohamed Eldesouki

Our model surpasses all previous state-of-the-art systems with a CW error rate (CWER) of 2. 86\% and a CE error rate (CEER) of 3. 7% for Modern Standard Arabic (MSA) and CWER of 2. 2% and CEER of 2. 5% for Classical Arabic (CA).

Feature Engineering

Arabic Offensive Language on Twitter: Analysis and Experiments

no code implementations EACL (WANLP) 2021 Hamdy Mubarak, Ammar Rashed, Kareem Darwish, Younes Samih, Ahmed Abdelali

Detecting offensive language on Twitter has many applications ranging from detecting/predicting bullying to measuring polarization.

A Few Topical Tweets are Enough for Effective User-Level Stance Detection

no code implementations7 Apr 2020 Younes Samih, Kareem Darwish

We show that this approach outperforms two strong baselines and achieves 89. 6% accuracy and 91. 3% macro F-measure on eight controversial topics.

Clustering General Classification +1

Arabic Dialect Identification in the Wild

no code implementations13 May 2020 Ahmed Abdelali, Hamdy Mubarak, Younes Samih, Sabit Hassan, Kareem Darwish

We present QADI, an automatically collected dataset of tweets belonging to a wide range of country-level Arabic dialects -covering 18 different countries in the Middle East and North Africa region.

Dialect Identification

Embedding-based Qualitative Analysis of Polarization in Turkey

no code implementations23 Sep 2019 Mucahid Kutlu, Kareem Darwish, Cansin Bayrak, Ammar Rashed, Tamer Elsayed

During the election period, the Turkish people extensively shared their political opinions on Twitter.

Social and Information Networks

Overview of OSACT4 Arabic Offensive Language Detection Shared Task

no code implementations LREC 2020 Hamdy Mubarak, Kareem Darwish, Walid Magdy, Tamer Elsayed, Hend Al-Khalifa

This paper provides an overview of the offensive language detection shared task at the 4th workshop on Open-Source Arabic Corpora and Processing Tools (OSACT4).

Predicting the Topical Stance and Political Leaning of Media using Tweets

no code implementations ACL 2020 Peter Stefanov, Kareem Darwish, Atanas Atanasov, Preslav Nakov

Discovering the stances of media outlets and influential people on current, debatable topics is important for social statisticians and policy makers.

Political Framing: US COVID19 Blame Game

no code implementations19 Jul 2020 Chereen Shurafa, Kareem Darwish, Wajdi Zaghouani

Through the use of Twitter, framing has become a prominent presidential campaign tool for politically active users.

Quantifying Polarization on Twitter: the Kavanaugh Nomination

no code implementations5 Jan 2020 Kareem Darwish

This paper addresses polarization quantification, particularly as it pertains to the nomination of Brett Kavanaugh to the US Supreme Court and his subsequent confirmation with the narrowest margin since 1881.

Social and Information Networks J.4

BERT Transformer model for Detecting Arabic GPT2 Auto-Generated Tweets

no code implementations COLING (WANLP) 2020 Fouzi Harrag, Maria Debbah, Kareem Darwish, Ahmed Abdelali

To the best of our knowledge, this work is the first study where ARABERT and GPT2 were combined to detect and classify the Arabic auto-generated texts.

Face Swapping Sentence +2

Pre-Training BERT on Arabic Tweets: Practical Considerations

no code implementations21 Feb 2021 Ahmed Abdelali, Sabit Hassan, Hamdy Mubarak, Kareem Darwish, Younes Samih

The experiments highlight the centrality of data diversity and the efficacy of linguistically aware segmentation.

Cross-lingual Emotion Detection

no code implementations LREC 2022 Sabit Hassan, Shaden Shaar, Kareem Darwish

Next, we show that using cross-lingual approaches with English data alone, we can achieve more than 90% and 80% relative effectiveness of the Arabic and Spanish BERT models respectively.

A Few Topical Tweets are Enough for Effective User Stance Detection

no code implementations EACL 2021 Younes Samih, Kareem Darwish

We show that this approach outperforms two strong baselines and achieves 89. 6{\%} accuracy and 91. 3{\%} macro F-measure on eight controversial topics.

Clustering Stance Detection

ASAD: Arabic Social media Analytics and unDerstanding

no code implementations EACL 2021 Sabit Hassan, Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish

This system demonstration paper describes ASAD: Arabic Social media Analysis and unDerstanding, a suite of seven individual modules that allows users to determine dialects, sentiment, news category, offensiveness, hate speech, adult content, and spam in Arabic tweets.

Arabic Curriculum Analysis

no code implementations COLING 2020 Hamdy Mubarak, Shimaa Amer, Ahmed Abdelali, Kareem Darwish

Developing a platform that analyzes the content of curricula can help identify their shortcomings and whether they are tailored to specific desired outcomes.

Improving Arabic Text Categorization Using Transformer Training Diversification

1 code implementation COLING (WANLP) 2020 Shammur Absar Chowdhury, Ahmed Abdelali, Kareem Darwish, Jung Soon-Gyo, Joni Salminen, Bernard J. Jansen

Automatic categorization of short texts, such as news headlines and social media posts, has many applications ranging from content analysis to recommendation systems.

Recommendation Systems Text Categorization

QADI: Arabic Dialect Identification in the Wild

no code implementations EACL (WANLP) 2021 Ahmed Abdelali, Hamdy Mubarak, Younes Samih, Sabit Hassan, Kareem Darwish

For extrinsic evaluation, we are able to build effective country level dialect identification on tweets with a macro-averaged F1-score of 60. 6% across 18 classes.

Dialect Identification

Automatic Expansion and Retargeting of Arabic Offensive Language Training

no code implementations18 Nov 2021 Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish, Younes Samih

Rampant use of offensive language on social media led to recent efforts on automatic identification of such language.

NatiQ: An End-to-end Text-to-Speech System for Arabic

no code implementations15 Jun 2022 Ahmed Abdelali, Nadir Durrani, Cenk Demiroglu, Fahim Dalvi, Hamdy Mubarak, Kareem Darwish

We concatenated Tacotron1 with the WaveRNN vocoder, Tacotron2 with the WaveGlow vocoder and ESPnet transformer with the parallel wavegan vocoder to synthesize waveforms from the spectrograms.

MTLens: Machine Translation Output Debugging

no code implementations LREC 2022 Shreyas Sharma, Kareem Darwish, Lucas Pavanelli, Thiago castro Ferreira, Mohamed Al-Badrashiny, Kamer Ali Yuksel, Hassan Sawaf

The performance of Machine Translation (MT) systems varies significantly with inputs of diverging features such as topics, genres, and surface properties.

Benchmarking Machine Translation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.