no code implementations • OSACT (LREC) 2022 • Hamdy Mubarak, Hend Al-Khalifa, Abdulmohsen Al-Thubaity
This paper provides an overview of the shard task on detecting offensive language, hate speech, and fine-grained hate speech at the fifth workshop on Open-Source Arabic Corpora and Processing Tools (OSACT5).
no code implementations • EACL (WANLP) 2021 • Ahmed Abdelali, Hamdy Mubarak, Younes Samih, Sabit Hassan, Kareem Darwish
For extrinsic evaluation, we are able to build effective country level dialect identification on tweets with a macro-averaged F1-score of 60. 6% across 18 classes.
no code implementations • EACL (WANLP) 2021 • Hamdy Mubarak, Sabit Hassan, Ahmed Abdelali
With Twitter being one of the most popular social media platforms in the Arab region, it is not surprising to find accounts that post adult content in Arabic tweets; despite the fact that these platforms dissuade users from such content.
no code implementations • EACL (WANLP) 2021 • Hamdy Mubarak, Sabit Hassan
Mapping user locations to countries can be useful for many applications such as dialect identification, author profiling, recommendation system, etc.
no code implementations • 5 Aug 2024 • Yassine El Kheir, Hamdy Mubarak, Ahmed Ali, Shammur Absar Chowdhury
Phonetically correct transcribed speech resources for dialectal Arabic are scarce.
1 code implementation • 7 Mar 2024 • Ekaterina Fadeeva, Aleksandr Rubashevskii, Artem Shelmanov, Sergey Petrakov, Haonan Li, Hamdy Mubarak, Evgenii Tsymbalov, Gleb Kuzmin, Alexander Panchenko, Timothy Baldwin, Preslav Nakov, Maxim Panov
Uncertainty scores leverage information encapsulated in the output of a neural network or its layers to detect unreliable predictions, and we show that they can be used to fact-check the atomic claims in the LLM output.
no code implementations • 6 Nov 2023 • Maram Hasanain, Firoj Alam, Hamdy Mubarak, Samir Abdaljalil, Wajdi Zaghouani, Preslav Nakov, Giovanni Da San Martino, Abed Alhakim Freihat
We present an overview of the ArAIEval shared task, organized as part of the first ArabicNLP 2023 conference co-located with EMNLP 2023.
1 code implementation • 9 Aug 2023 • Fahim Dalvi, Maram Hasanain, Sabri Boughorbel, Basel Mousi, Samir Abdaljalil, Nizi Nazar, Ahmed Abdelali, Shammur Absar Chowdhury, Hamdy Mubarak, Ahmed Ali, Majd Hawasly, Nadir Durrani, Firoj Alam
In this study, we introduce the LLMeBench framework, which can be seamlessly customized to evaluate LLMs for any NLP task, regardless of language.
no code implementations • 24 May 2023 • Ahmed Abdelali, Hamdy Mubarak, Shammur Absar Chowdhury, Maram Hasanain, Basel Mousi, Sabri Boughorbel, Yassine El Kheir, Daniel Izham, Fahim Dalvi, Majd Hawasly, Nizi Nazar, Yousseif Elshahawy, Ahmed Ali, Nadir Durrani, Natasa Milic-Frayling, Firoj Alam
Our findings provide valuable insights into the applicability of LLMs for Arabic NLP and speech processing tasks.
no code implementations • 9 May 2023 • Yassine El Kheir, Fouad Khnaisser, Shammur Absar Chowdhury, Hamdy Mubarak, Shazia Afzal, Ahmed Ali
This paper introduces a novel Arabic pronunciation learning application QVoice, powered with end-to-end mispronunciation detection and feedback generator module.
no code implementations • 5 May 2023 • Hamdy Mubarak, Samir Abdaljalil, Azza Nassar, Firoj Alam
Social media platforms empower us in several ways, from information dissemination to consumption.
1 code implementation • 22 Jan 2023 • Massa Baali, Tomoki Hayashi, Hamdy Mubarak, Soumi Maiti, Shinji Watanabe, Wassim El-Hajj, Ahmed Ali
Several high-resource Text to Speech (TTS) systems currently produce natural, well-established human-like speech.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 22 Nov 2022 • Injy Hamed, Amir Hussein, Oumnia Chellah, Shammur Chowdhury, Hamdy Mubarak, Sunayana Sitaram, Nizar Habash, Ahmed Ali
Code-switching poses a number of challenges and opportunities for multilingual automatic speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 18 Nov 2022 • Firoj Alam, Hamdy Mubarak, Wajdi Zaghouani, Giovanni Da San Martino, Preslav Nakov
Thus, there has been a lot of recent research on automatic detection of propaganda techniques in text as well as in memes.
no code implementations • 2 Nov 2022 • Yassine El Kheir, Shammur Absar Chowdhury, Ahmed Ali, Hamdy Mubarak, Shazia Afzal
Our proposed technique achieves state-of-the-art results, with Speechocean762, on ASR dependent mispronunciation detection models at phoneme level, with a 2. 0% gain in Pearson Correlation Coefficient (PCC) compared to the previous state-of-the-art [1].
Ranked #4 on Phone-level pronunciation scoring on speechocean762
no code implementations • 15 Jun 2022 • Ahmed Abdelali, Nadir Durrani, Cenk Demiroglu, Fahim Dalvi, Hamdy Mubarak, Kareem Darwish
We concatenated Tacotron1 with the WaveRNN vocoder, Tacotron2 with the WaveGlow vocoder and ESPnet transformer with the parallel wavegan vocoder to synthesize waveforms from the spectrograms.
no code implementations • COLING (WNUT) 2022 • Hamdy Mubarak, Shammur Absar Chowdhury, Firoj Alam
Gender analysis of Twitter can reveal important socio-cultural differences between male and female users.
no code implementations • 18 Jan 2022 • Hamdy Mubarak, Sabit Hassan, Shammur Absar Chowdhury
We evaluate our models on external datasets - a Twitter dataset collected using a completely different method, and a multi-platform dataset containing comments from Twitter, YouTube and Facebook, for assessing generalization capability.
no code implementations • LREC 2022 • Hamdy Mubarak, Sabit Hassan, Shammur Absar Chowdhury, Firoj Alam
We studied the data for individual types of tweets and temporal changes in stance towards vaccine.
no code implementations • 18 Nov 2021 • Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish, Younes Samih
Rampant use of offensive language on social media led to recent efforts on automatic identification of such language.
no code implementations • ACL 2021 • Hamdy Mubarak, Amir Hussein, Shammur Absar Chowdhury, Ahmed Ali
We also report the first baseline for Arabic punctuation restoration.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +8
no code implementations • 24 Jun 2021 • Hamdy Mubarak, Amir Hussein, Shammur Absar Chowdhury, Ahmed Ali
We also report the first baseline for Arabic punctuation restoration.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +8
no code implementations • EACL 2021 • Sabit Hassan, Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish
This system demonstration paper describes ASAD: Arabic Social media Analysis and unDerstanding, a suite of seven individual modules that allows users to determine dialects, sentiment, news category, offensiveness, hate speech, adult content, and spam in Arabic tweets.
no code implementations • 21 Feb 2021 • Ahmed Abdelali, Sabit Hassan, Hamdy Mubarak, Kareem Darwish, Younes Samih
The experiments highlight the centrality of data diversity and the efficacy of linguistically aware segmentation.
no code implementations • EACL (Louhi) 2021 • Hamdy Mubarak, Sabit Hassan
Over the past few months, there were huge numbers of circulating tweets and discussions about Coronavirus (COVID-19) in the Arab region.
no code implementations • COLING 2020 • Hamdy Mubarak, Shimaa Amer, Ahmed Abdelali, Kareem Darwish
Developing a platform that analyzes the content of curricula can help identify their shortcomings and whether they are tailored to specific desired outcomes.
no code implementations • SEMEVAL 2020 • Sabit Hassan, Younes Samih, Hamdy Mubarak, Ahmed Abdelali
This paper describes the systems submitted by the Arabic Language Technology group (ALT) at SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media.
no code implementations • 25 Nov 2020 • Kareem Darwish, Nizar Habash, Mourad Abbas, Hend Al-Khalifa, Huseein T. Al-Natsheh, Samhaa R. El-Beltagy, Houda Bouamor, Karim Bouzoubaa, Violetta Cavalli-Sforza, Wassim El-Hajj, Mustafa Jarrar, Hamdy Mubarak
The term natural language refers to any system of symbolic communication (spoken, signed or written) without intentional human planning and design.
1 code implementation • 15 Jul 2020 • Firoj Alam, Fahim Dalvi, Shaden Shaar, Nadir Durrani, Hamdy Mubarak, Alex Nikolov, Giovanni Da San Martino, Ahmed Abdelali, Hassan Sajjad, Kareem Darwish, Preslav Nakov
With the outbreak of the COVID-19 pandemic, people turned to social media to read and to share timely information including statistics, warnings, advice, and inspirational stories.
no code implementations • SEMEVAL 2020 • Marcos Zampieri, Preslav Nakov, Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Hamdy Mubarak, Leon Derczynski, Zeses Pitenis, Çağrı Çöltekin
We present the results and main findings of SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval 2020).
no code implementations • 13 May 2020 • Ahmed Abdelali, Hamdy Mubarak, Younes Samih, Sabit Hassan, Kareem Darwish
We present QADI, an automatically collected dataset of tweets belonging to a wide range of country-level Arabic dialects -covering 18 different countries in the Middle East and North Africa region.
no code implementations • LREC 2020 • Shammur Absar Chowdhury, Hamdy Mubarak, Ahmed Abdelali, Soon-gyo Jung, Bernard J Jansen, Joni Salminen
Hence, it is important to detect offensive comments in social media platforms.
no code implementations • LREC 2020 • Sabit Hassan, Younes Samih, Hamdy Mubarak, Ahmed Abdelali, Ammar Rashed, Shammur Absar Chowdhury
In this paper, we describe our efforts at OSACT Shared Task on Offensive Language Detection.
no code implementations • LREC 2020 • Hamdy Mubarak, Sabit Hassan, Ahmed Abdelali
In this paper, we introduce a generic method for collecting parallel tweets.
no code implementations • LREC 2020 • Hamdy Mubarak, Kareem Darwish, Walid Magdy, Tamer Elsayed, Hend Al-Khalifa
This paper provides an overview of the offensive language detection shared task at the 4th workshop on Open-Source Arabic Corpora and Processing Tools (OSACT4).
2 code implementations • Findings (EMNLP) 2021 • Firoj Alam, Shaden Shaar, Fahim Dalvi, Hassan Sajjad, Alex Nikolov, Hamdy Mubarak, Giovanni Da San Martino, Ahmed Abdelali, Nadir Durrani, Kareem Darwish, Abdulaziz Al-Homaid, Wajdi Zaghouani, Tommaso Caselli, Gijs Danoe, Friso Stolk, Britt Bruntink, Preslav Nakov
With the emergence of the COVID-19 pandemic, the political and the medical aspects of disinformation merged as the problem got elevated to a whole new level to become the first global infodemic.
no code implementations • EACL (WANLP) 2021 • Hamdy Mubarak, Ammar Rashed, Kareem Darwish, Younes Samih, Ahmed Abdelali
Detecting offensive language on Twitter has many applications ranging from detecting/predicting bullying to measuring polarization.
no code implementations • 4 Feb 2020 • Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak, Mohamed Eldesouki
Our model surpasses all previous state-of-the-art systems with a CW error rate (CWER) of 2. 86\% and a CE error rate (CEER) of 3. 7% for Modern Standard Arabic (MSA) and CWER of 2. 2% and CEER of 2. 5% for Classical Arabic (CA).
no code implementations • SEMEVAL 2016 • Preslav Nakov, Lluís Màrquez, Alessandro Moschitti, Walid Magdy, Hamdy Mubarak, Abed Alhakim Freihat, James Glass, Bilal Randeree
This paper describes the SemEval--2016 Task 3 on Community Question Answering, which we offered in English and Arabic.
1 code implementation • SEMEVAL 2017 • Preslav Nakov, Doris Hoogeveen, Lluís Màrquez, Alessandro Moschitti, Hamdy Mubarak, Timothy Baldwin, Karin Verspoor
We describe SemEval-2017 Task 3 on Community Question Answering.
no code implementations • IJCNLP 2019 • Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish, Mohamed Eldesouki, Younes Samih, Hassan Sajjad
Short vowels, aka diacritics, are more often omitted when writing different varieties of Arabic including Modern Standard Arabic (MSA), Classical Arabic (CA), and Dialectal Arabic (DA).
no code implementations • WS 2019 • Mohammed Attia, Younes Samih, Ali Elkahky, Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish
When speakers code-switch between their native language and a second language or language variant, they follow a syntactic pattern where words and phrases from the embedded language are inserted into the matrix language.
no code implementations • WS 2019 • Younes Samih, Hamdy Mubarak, Ahmed Abdelali, Mohammed Attia, Mohamed Eldesouki, Kareem Darwish
This paper describes the QC-GO team submission to the MADAR Shared Task Subtask 1 (travel domain dialect identification) and Subtask 2 (Twitter user location identification).
no code implementations • NAACL 2019 • Hamdy Mubarak, Ahmed Abdelali, Hassan Sajjad, Younes Samih, Kareem Darwish
Arabic text is typically written without short vowels (or diacritics).
no code implementations • 15 Oct 2018 • Ahmed Abdelali, Mohammed Attia, Younes Samih, Kareem Darwish, Hamdy Mubarak
Diacritization process attempt to restore the short vowels in Arabic written text; which typically are omitted.
no code implementations • LREC 2018 • Hamdy Mubarak
In this paper we describe the complexity of building a lemmatizer for Arabic which has a rich and complex derivational morphology, and we discuss the need for a fast and accurate lammatization to enhance Arabic Information Retrieval (IR) results.
2 code implementations • 19 Aug 2017 • Mohamed Eldesouki, Younes Samih, Ahmed Abdelali, Mohammed Attia, Hamdy Mubarak, Kareem Darwish, Kallmeyer Laura
Arabic word segmentation is essential for a variety of NLP applications such as machine translation and information retrieval.
Ranked #1 on Sentiment Analysis on DynaSent (using extra training data)
no code implementations • WS 2017 • Hamdy Mubarak, Kareem Darwish, Walid Magdy
We expand the list of obscene words using this classification, and we report results on a newly created dataset of classified Arabic tweets (obscene, offensive, and clean).
no code implementations • CONLL 2017 • Younes Samih, Mohamed Eldesouki, Mohammed Attia, Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak, Laura Kallmeyer
Arabic dialects do not just share a common koin{\'e}, but there are shared pan-dialectal linguistic phenomena that allow computational models for dialects to learn from each other.
no code implementations • EACL 2017 • Fahim Dalvi, Yifan Zhang, Sameer Khurana, Nadir Durrani, Hassan Sajjad, Ahmed Abdelali, Hamdy Mubarak, Ahmed Ali, Stephan Vogel
This paper presents QCRI{'}s Arabic-to-English live speech translation system.
no code implementations • WS 2017 • Kareem Darwish, Hamdy Mubarak, Ahmed Abdelali
In this paper, we present a new and fast state-of-the-art Arabic diacritizer that guesses the diacritics of words and then their case endings.
no code implementations • WS 2017 • Kareem Darwish, Hamdy Mubarak, Ahmed Abdelali, Mohamed Eldesouki
However, we show that augmenting bi-LSTM sequence labeling with some of the features that we used for the SVM-Rank based tagger yields to further improvements.
no code implementations • WS 2017 • Younes Samih, Mohammed Attia, Mohamed Eldesouki, Ahmed Abdelali, Hamdy Mubarak, Laura Kallmeyer, Kareem Darwish
The automated processing of Arabic Dialects is challenging due to the lack of spelling standards and to the scarcity of annotated data and resources in general.
no code implementations • 19 Sep 2016 • Ahmed Ali, Peter Bell, James Glass, Yacine Messaoui, Hamdy Mubarak, Steve Renals, Yifan Zhang
For language modelling, we made available over 110M words crawled from Aljazeera Arabic website Aljazeera. net for a 10 year duration 2000-2011.
no code implementations • LREC 2016 • Kareem Darwish, Hamdy Mubarak
Meanwhile, Farasa is nearly one order of magnitude faster than QATARA and two orders of magnitude faster than MADAMIRA.
no code implementations • LREC 2016 • Hamdy Mubarak, Ahmed Abdelali
We present a novel approach for mining data from Twitter for the purpose of building transliteration resources and systems.
no code implementations • SEMEVAL 2015 • Massimo Nicosia, Simone Filice, Alberto Barr{\'o}n-Cede{\~n}o, Iman Saleh, Hamdy Mubarak, Wei Gao, Preslav Nakov, Giovanni Da San Martino, Aless Moschitti, ro, Kareem Darwish, Llu{\'\i}s M{\`a}rquez, Shafiq Joty, Walid Magdy
no code implementations • LREC 2014 • Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak
This paper presents an end-to-end automatic processing system for Arabic.