no code implementations • LREC 2022 • Shreyas Sharma, Kareem Darwish, Lucas Pavanelli, Thiago castro Ferreira, Mohamed Al-Badrashiny, Kamer Ali Yuksel, Hassan Sawaf
The performance of Machine Translation (MT) systems varies significantly with inputs of diverging features such as topics, genres, and surface properties.
no code implementations • EACL (WANLP) 2021 • Ahmed Abdelali, Hamdy Mubarak, Younes Samih, Sabit Hassan, Kareem Darwish
For extrinsic evaluation, we are able to build effective country level dialect identification on tweets with a macro-averaged F1-score of 60. 6% across 18 classes.
1 code implementation • COLING (WANLP) 2020 • Shammur Absar Chowdhury, Ahmed Abdelali, Kareem Darwish, Jung Soon-Gyo, Joni Salminen, Bernard J. Jansen
Automatic categorization of short texts, such as news headlines and social media posts, has many applications ranging from content analysis to recommendation systems.
no code implementations • 12 Aug 2024 • Abdelrahman El-Sheikh, Ahmed Elmogtaba, Kareem Darwish, Muhammad Elmallah, Ashraf Elneima, Hassan Sawaf
The debut of chatGPT and BARD has popularized instruction following text generation using LLMs, where a user can interrogate an LLM using natural language requests and obtain natural language answers that matches their requests.
1 code implementation • 26 Feb 2024 • Ahmet Gunduz, Kamer Ali Yuksel, Kareem Darwish, Golara Javadi, Fabio Minazzi, Nicola Sobieski, Sebastien Bratieres
Data availability is crucial for advancing artificial intelligence applications, including voice-based technologies.
no code implementations • 15 Jun 2022 • Ahmed Abdelali, Nadir Durrani, Cenk Demiroglu, Fahim Dalvi, Hamdy Mubarak, Kareem Darwish
We concatenated Tacotron1 with the WaveRNN vocoder, Tacotron2 with the WaveGlow vocoder and ESPnet transformer with the parallel wavegan vocoder to synthesize waveforms from the spectrograms.
no code implementations • 18 Nov 2021 • Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish, Younes Samih
Rampant use of offensive language on social media led to recent efforts on automatic identification of such language.
no code implementations • LREC 2022 • Sabit Hassan, Shaden Shaar, Kareem Darwish
Next, we show that using cross-lingual approaches with English data alone, we can achieve more than 90% and 80% relative effectiveness of the Arabic and Spanish BERT models respectively.
no code implementations • EACL 2021 • Younes Samih, Kareem Darwish
We show that this approach outperforms two strong baselines and achieves 89. 6{\%} accuracy and 91. 3{\%} macro F-measure on eight controversial topics.
no code implementations • EACL 2021 • Sabit Hassan, Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish
This system demonstration paper describes ASAD: Arabic Social media Analysis and unDerstanding, a suite of seven individual modules that allows users to determine dialects, sentiment, news category, offensiveness, hate speech, adult content, and spam in Arabic tweets.
no code implementations • 21 Feb 2021 • Ahmed Abdelali, Sabit Hassan, Hamdy Mubarak, Kareem Darwish, Younes Samih
The experiments highlight the centrality of data diversity and the efficacy of linguistically aware segmentation.
no code implementations • COLING (WANLP) 2020 • Fouzi Harrag, Maria Debbah, Kareem Darwish, Ahmed Abdelali
To the best of our knowledge, this work is the first study where ARABERT and GPT2 were combined to detect and classify the Arabic auto-generated texts.
no code implementations • COLING 2020 • Hamdy Mubarak, Shimaa Amer, Ahmed Abdelali, Kareem Darwish
Developing a platform that analyzes the content of curricula can help identify their shortcomings and whether they are tailored to specific desired outcomes.
no code implementations • 25 Nov 2020 • Kareem Darwish, Nizar Habash, Mourad Abbas, Hend Al-Khalifa, Huseein T. Al-Natsheh, Samhaa R. El-Beltagy, Houda Bouamor, Karim Bouzoubaa, Violetta Cavalli-Sforza, Wassim El-Hajj, Mustafa Jarrar, Hamdy Mubarak
The term natural language refers to any system of symbolic communication (spoken, signed or written) without intentional human planning and design.
no code implementations • 19 Jul 2020 • Chereen Shurafa, Kareem Darwish, Wajdi Zaghouani
Through the use of Twitter, framing has become a prominent presidential campaign tool for politically active users.
1 code implementation • 15 Jul 2020 • Firoj Alam, Fahim Dalvi, Shaden Shaar, Nadir Durrani, Hamdy Mubarak, Alex Nikolov, Giovanni Da San Martino, Ahmed Abdelali, Hassan Sajjad, Kareem Darwish, Preslav Nakov
With the outbreak of the COVID-19 pandemic, people turned to social media to read and to share timely information including statistics, warnings, advice, and inspirational stories.
no code implementations • ACL 2020 • Peter Stefanov, Kareem Darwish, Atanas Atanasov, Preslav Nakov
Discovering the stances of media outlets and influential people on current, debatable topics is important for social statisticians and policy makers.
1 code implementation • 19 May 2020 • Ammar Rashed, Mucahid Kutlu, Kareem Darwish, Tamer Elsayed, Cansin Bayrak
On June 24, 2018, Turkey conducted a highly consequential election in which the Turkish people elected their president and parliament in the first election under a new presidential system.
Ranked #1 on
Stance Detection
on Trump Midterm Elections 2018
no code implementations • 13 May 2020 • Ahmed Abdelali, Hamdy Mubarak, Younes Samih, Sabit Hassan, Kareem Darwish
We present QADI, an automatically collected dataset of tweets belonging to a wide range of country-level Arabic dialects -covering 18 different countries in the Middle East and North Africa region.
no code implementations • LREC 2020 • Hamdy Mubarak, Kareem Darwish, Walid Magdy, Tamer Elsayed, Hend Al-Khalifa
This paper provides an overview of the offensive language detection shared task at the 4th workshop on Open-Source Arabic Corpora and Processing Tools (OSACT4).
2 code implementations • Findings (EMNLP) 2021 • Firoj Alam, Shaden Shaar, Fahim Dalvi, Hassan Sajjad, Alex Nikolov, Hamdy Mubarak, Giovanni Da San Martino, Ahmed Abdelali, Nadir Durrani, Kareem Darwish, Abdulaziz Al-Homaid, Wajdi Zaghouani, Tommaso Caselli, Gijs Danoe, Friso Stolk, Britt Bruntink, Preslav Nakov
With the emergence of the COVID-19 pandemic, the political and the medical aspects of disinformation merged as the problem got elevated to a whole new level to become the first global infodemic.
no code implementations • 7 Apr 2020 • Younes Samih, Kareem Darwish
We show that this approach outperforms two strong baselines and achieves 89. 6% accuracy and 91. 3% macro F-measure on eight controversial topics.
no code implementations • EACL (WANLP) 2021 • Hamdy Mubarak, Ammar Rashed, Kareem Darwish, Younes Samih, Ahmed Abdelali
Detecting offensive language on Twitter has many applications ranging from detecting/predicting bullying to measuring polarization.
no code implementations • 4 Feb 2020 • Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak, Mohamed Eldesouki
Our model surpasses all previous state-of-the-art systems with a CW error rate (CWER) of 2. 86\% and a CE error rate (CEER) of 3. 7% for Modern Standard Arabic (MSA) and CWER of 2. 2% and CEER of 2. 5% for Classical Arabic (CA).
no code implementations • 5 Jan 2020 • Kareem Darwish
This paper addresses polarization quantification, particularly as it pertains to the nomination of Brett Kavanaugh to the US Supreme Court and his subsequent confirmation with the narrowest margin since 1881.
Social and Information Networks J.4
no code implementations • IJCNLP 2019 • Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish, Mohamed Eldesouki, Younes Samih, Hassan Sajjad
Short vowels, aka diacritics, are more often omitted when writing different varieties of Arabic including Modern Standard Arabic (MSA), Classical Arabic (CA), and Dialectal Arabic (DA).
no code implementations • IJCNLP 2019 • Yifan Zhang, Giovanni Da San Martino, Alberto Barrón-Cedeño, Salvatore Romeo, Jisun An, Haewoon Kwak, Todor Staykovski, Israa Jaradat, Georgi Karadzhov, Ramy Baly, Kareem Darwish, James Glass, Preslav Nakov
We introduce Tanbih, a news aggregator with intelligent analysis tools to help readers understanding what's behind a news story.
no code implementations • 23 Sep 2019 • Mucahid Kutlu, Kareem Darwish, Cansin Bayrak, Ammar Rashed, Tamer Elsayed
During the election period, the Turkish people extensively shared their political opinions on Twitter.
Social and Information Networks
no code implementations • WS 2019 • Younes Samih, Hamdy Mubarak, Ahmed Abdelali, Mohammed Attia, Mohamed Eldesouki, Kareem Darwish
This paper describes the QC-GO team submission to the MADAR Shared Task Subtask 1 (travel domain dialect identification) and Subtask 2 (Twitter user location identification).
no code implementations • WS 2019 • Mohammed Attia, Younes Samih, Ali Elkahky, Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish
When speakers code-switch between their native language and a second language or language variant, they follow a syntactic pattern where words and phrases from the embedded language are inserted into the matrix language.
no code implementations • 2 Jul 2019 • Peter Stefanov, Kareem Darwish, Atanas Atanasov, Preslav Nakov
Discovering the stances of media outlets and influential people on current, debatable topics is important for social statisticians and policy makers.
no code implementations • NAACL 2019 • Hamdy Mubarak, Ahmed Abdelali, Hassan Sajjad, Younes Samih, Kareem Darwish
Arabic text is typically written without short vowels (or diacritics).
2 code implementations • 3 Apr 2019 • Kareem Darwish, Peter Stefanov, Michaël J. Aupetit, Preslav Nakov
We experiment with different combinations of user similarity features, dataset sizes, dimensionality reduction methods, and clustering algorithms to ascertain the most effective and most computationally efficient combinations across three different datasets (in English and Turkish).
Social and Information Networks 62P25, 91D30
no code implementations • 15 Oct 2018 • Ahmed Abdelali, Mohammed Attia, Younes Samih, Kareem Darwish, Hamdy Mubarak
Diacritization process attempt to restore the short vowels in Arabic written text; which typically are omitted.
2 code implementations • 19 Aug 2017 • Mohamed Eldesouki, Younes Samih, Ahmed Abdelali, Mohammed Attia, Hamdy Mubarak, Kareem Darwish, Kallmeyer Laura
Arabic word segmentation is essential for a variety of NLP applications such as machine translation and information retrieval.
Ranked #1 on
Sentiment Analysis
on DynaSent
(10 fold Cross validation metric, using extra
training data)
no code implementations • CONLL 2017 • Younes Samih, Mohamed Eldesouki, Mohammed Attia, Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak, Laura Kallmeyer
Arabic dialects do not just share a common koin{\'e}, but there are shared pan-dialectal linguistic phenomena that allow computational models for dialects to learn from each other.
no code implementations • WS 2017 • Hamdy Mubarak, Kareem Darwish, Walid Magdy
We expand the list of obscene words using this classification, and we report results on a newly created dataset of classified Arabic tweets (obscene, offensive, and clean).
no code implementations • WS 2017 • Kareem Darwish, Hamdy Mubarak, Ahmed Abdelali
In this paper, we present a new and fast state-of-the-art Arabic diacritizer that guesses the diacritics of words and then their case endings.
no code implementations • WS 2017 • Younes Samih, Mohammed Attia, Mohamed Eldesouki, Ahmed Abdelali, Hamdy Mubarak, Laura Kallmeyer, Kareem Darwish
The automated processing of Arabic Dialects is challenging due to the lack of spelling standards and to the scarcity of annotated data and resources in general.
no code implementations • WS 2017 • Kareem Darwish, Hamdy Mubarak, Ahmed Abdelali, Mohamed Eldesouki
However, we show that augmenting bi-LSTM sequence labeling with some of the features that we used for the SVM-Rank based tagger yields to further improvements.
no code implementations • WS 2016 • Mohamed Eldesouki, Fahim Dalvi, Hassan Sajjad, Kareem Darwish
We submitted four runs to the Arabic sub-task.
no code implementations • LREC 2016 • Kareem Darwish, Hamdy Mubarak
Meanwhile, Farasa is nearly one order of magnitude faster than QATARA and two orders of magnitude faster than MADAMIRA.
no code implementations • SEMEVAL 2015 • Massimo Nicosia, Simone Filice, Alberto Barr{\'o}n-Cede{\~n}o, Iman Saleh, Hamdy Mubarak, Wei Gao, Preslav Nakov, Giovanni Da San Martino, Aless Moschitti, ro, Kareem Darwish, Llu{\'\i}s M{\`a}rquez, Shafiq Joty, Walid Magdy
no code implementations • LREC 2014 • Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak
This paper presents an end-to-end automatic processing system for Arabic.
no code implementations • LREC 2014 • Kareem Darwish, Wei Gao
Despite many recent papers on Arabic Named Entity Recognition (NER) in the news domain, little work has been done on microblog NER.
no code implementations • WS 2014 • Kareem Darwish
In this paper we address the problems of: identifying Arabizi in text and converting it to Arabic characters.