no code implementations • EACL (WANLP) 2021 • Hamdy Mubarak, Sabit Hassan, Ahmed Abdelali
With Twitter being one of the most popular social media platforms in the Arab region, it is not surprising to find accounts that post adult content in Arabic tweets; despite the fact that these platforms dissuade users from such content.
no code implementations • EACL (WANLP) 2021 • Ahmed Abdelali, Hamdy Mubarak, Younes Samih, Sabit Hassan, Kareem Darwish
For extrinsic evaluation, we are able to build effective country level dialect identification on tweets with a macro-averaged F1-score of 60. 6% across 18 classes.
1 code implementation • COLING (WANLP) 2020 • Shammur Absar Chowdhury, Ahmed Abdelali, Kareem Darwish, Jung Soon-Gyo, Joni Salminen, Bernard J. Jansen
Automatic categorization of short texts, such as news headlines and social media posts, has many applications ranging from content analysis to recommendation systems.
no code implementations • 22 Jul 2024 • M Saiful Bari, Yazeed Alnumay, Norah A. Alzahrani, Nouf M. Alotaibi, Hisham A. Alyahya, Sultan Alrashed, Faisal A. Mirza, Shaykhah Z. Alsubaie, Hassan A. Alahmed, Ghadah Alabduljabbar, Raghad Alkhathran, Yousef Almushayqih, Raneem Alnajim, Salman AlSubaihi, Maryam Al Mansour, Majed Alrubaian, Ali Alammari, Zaki Alawami, Abdulmohsen Al-Thubaity, Ahmed Abdelali, Jeril Kuriakose, Abdalghani Abujabal, Nora Al-Twairesh, Areeb Alowisheq, Haidar Khan
We present ALLaM: Arabic Large Language Model, a series of large language models to support the ecosystem of Arabic Language Technologies (ALT).
1 code implementation • 23 May 2024 • Basel Mousi, Nadir Durrani, Fahim Dalvi, Majd Hawasly, Ahmed Abdelali
Our analysis focuses on quantifying the \textit{alignment} and \textit{overlap} of these concepts across various languages within the latent space.
1 code implementation • 9 Aug 2023 • Fahim Dalvi, Maram Hasanain, Sabri Boughorbel, Basel Mousi, Samir Abdaljalil, Nizi Nazar, Ahmed Abdelali, Shammur Absar Chowdhury, Hamdy Mubarak, Ahmed Ali, Majd Hawasly, Nadir Durrani, Firoj Alam
In this study, we introduce the LLMeBench framework, which can be seamlessly customized to evaluate LLMs for any NLP task, regardless of language.
no code implementations • 24 May 2023 • Ahmed Abdelali, Hamdy Mubarak, Shammur Absar Chowdhury, Maram Hasanain, Basel Mousi, Sabri Boughorbel, Yassine El Kheir, Daniel Izham, Fahim Dalvi, Majd Hawasly, Nizi Nazar, Yousseif Elshahawy, Ahmed Ali, Nadir Durrani, Natasa Milic-Frayling, Firoj Alam
Our findings provide valuable insights into the applicability of LLMs for Arabic NLP and speech processing tasks.
no code implementations • 18 Oct 2022 • Ahmed Abdelali, Nadir Durrani, Fahim Dalvi, Hassan Sajjad
Given the success of pre-trained language models, many transformer models trained on Arabic and its dialects have surfaced.
no code implementations • 15 Jun 2022 • Ahmed Abdelali, Nadir Durrani, Cenk Demiroglu, Fahim Dalvi, Hamdy Mubarak, Kareem Darwish
We concatenated Tacotron1 with the WaveRNN vocoder, Tacotron2 with the WaveGlow vocoder and ESPnet transformer with the parallel wavegan vocoder to synthesize waveforms from the spectrograms.
no code implementations • 19 Jan 2022 • Ahmed Abdelali, Nadir Durrani, Fahim Dalvi, Hassan Sajjad
Arabic is a Semitic language which is widely spoken with many dialects.
no code implementations • 7 Jan 2022 • Amir Hussein, Shammur Absar Chowdhury, Ahmed Abdelali, Najim Dehak, Ahmed Ali, Sanjeev Khudanpur
The pervasiveness of intra-utterance code-switching (CS) in spoken content requires that speech recognition (ASR) systems handle mixed language.
no code implementations • 18 Nov 2021 • Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish, Younes Samih
Rampant use of offensive language on social media led to recent efforts on automatic identification of such language.
no code implementations • 31 May 2021 • Shammur Absar Chowdhury, Amir Hussein, Ahmed Abdelali, Ahmed Ali
We evaluate the system performance handling: (i) monolingual (Ar, En and Fr); (ii) multi-dialectal (Modern Standard Arabic, along with dialectal variation such as Egyptian and Moroccan); (iii) code-switching -- cross-lingual (Ar-En/Fr) and dialectal (MSA-Egyptian dialect) test cases, and compare with current state-of-the-art systems.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • EACL 2021 • Sabit Hassan, Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish
This system demonstration paper describes ASAD: Arabic Social media Analysis and unDerstanding, a suite of seven individual modules that allows users to determine dialects, sentiment, news category, offensiveness, hate speech, adult content, and spam in Arabic tweets.
no code implementations • 21 Feb 2021 • Ahmed Abdelali, Sabit Hassan, Hamdy Mubarak, Kareem Darwish, Younes Samih
The experiments highlight the centrality of data diversity and the efficacy of linguistically aware segmentation.
no code implementations • COLING (WANLP) 2020 • Fouzi Harrag, Maria Debbah, Kareem Darwish, Ahmed Abdelali
To the best of our knowledge, this work is the first study where ARABERT and GPT2 were combined to detect and classify the Arabic auto-generated texts.
no code implementations • COLING 2020 • Hassan Sajjad, Ahmed Abdelali, Nadir Durrani, Fahim Dalvi
The evaluation suite and the dialectal system are publicly available for research purposes.
no code implementations • SEMEVAL 2020 • Sabit Hassan, Younes Samih, Hamdy Mubarak, Ahmed Abdelali
This paper describes the systems submitted by the Arabic Language Technology group (ALT) at SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media.
no code implementations • COLING 2020 • Hamdy Mubarak, Shimaa Amer, Ahmed Abdelali, Kareem Darwish
Developing a platform that analyzes the content of curricula can help identify their shortcomings and whether they are tailored to specific desired outcomes.
1 code implementation • 15 Jul 2020 • Firoj Alam, Fahim Dalvi, Shaden Shaar, Nadir Durrani, Hamdy Mubarak, Alex Nikolov, Giovanni Da San Martino, Ahmed Abdelali, Hassan Sajjad, Kareem Darwish, Preslav Nakov
With the outbreak of the COVID-19 pandemic, people turned to social media to read and to share timely information including statistics, warnings, advice, and inspirational stories.
no code implementations • 13 May 2020 • Ahmed Abdelali, Hamdy Mubarak, Younes Samih, Sabit Hassan, Kareem Darwish
We present QADI, an automatically collected dataset of tweets belonging to a wide range of country-level Arabic dialects -covering 18 different countries in the Middle East and North Africa region.
no code implementations • LREC 2020 • Shammur Absar Chowdhury, Hamdy Mubarak, Ahmed Abdelali, Soon-gyo Jung, Bernard J Jansen, Joni Salminen
Hence, it is important to detect offensive comments in social media platforms.
no code implementations • LREC 2020 • Hamdy Mubarak, Sabit Hassan, Ahmed Abdelali
In this paper, we introduce a generic method for collecting parallel tweets.
no code implementations • LREC 2020 • Sabit Hassan, Younes Samih, Hamdy Mubarak, Ahmed Abdelali, Ammar Rashed, Shammur Absar Chowdhury
In this paper, we describe our efforts at OSACT Shared Task on Offensive Language Detection.
2 code implementations • Findings (EMNLP) 2021 • Firoj Alam, Shaden Shaar, Fahim Dalvi, Hassan Sajjad, Alex Nikolov, Hamdy Mubarak, Giovanni Da San Martino, Ahmed Abdelali, Nadir Durrani, Kareem Darwish, Abdulaziz Al-Homaid, Wajdi Zaghouani, Tommaso Caselli, Gijs Danoe, Friso Stolk, Britt Bruntink, Preslav Nakov
With the emergence of the COVID-19 pandemic, the political and the medical aspects of disinformation merged as the problem got elevated to a whole new level to become the first global infodemic.
no code implementations • EACL (WANLP) 2021 • Hamdy Mubarak, Ammar Rashed, Kareem Darwish, Younes Samih, Ahmed Abdelali
Detecting offensive language on Twitter has many applications ranging from detecting/predicting bullying to measuring polarization.
no code implementations • 4 Feb 2020 • Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak, Mohamed Eldesouki
Our model surpasses all previous state-of-the-art systems with a CW error rate (CWER) of 2. 86\% and a CE error rate (CEER) of 3. 7% for Modern Standard Arabic (MSA) and CWER of 2. 2% and CEER of 2. 5% for Classical Arabic (CA).
no code implementations • IJCNLP 2019 • Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish, Mohamed Eldesouki, Younes Samih, Hassan Sajjad
Short vowels, aka diacritics, are more often omitted when writing different varieties of Arabic including Modern Standard Arabic (MSA), Classical Arabic (CA), and Dialectal Arabic (DA).
no code implementations • RANLP 2019 • Irina Temnikova, Ahmed Abdelali, Souhila Djabri, Samy Hedaya
We analyze several speakers and interpreters variables via manual annotation and automatic methods.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • WS 2019 • Raki Lachraf, El Moatez Billah Nagoudi, Youcef Ayachi, Ahmed Abdelali, Didier Schwab
Word Embeddings (WE) are getting increasingly popular and widely applied in many Natural Language Processing (NLP) applications due to their effectiveness in capturing semantic properties of words; Machine Translation (MT), Information Retrieval (IR) and Information Extraction (IE) are among such areas.
no code implementations • WS 2019 • Younes Samih, Hamdy Mubarak, Ahmed Abdelali, Mohammed Attia, Mohamed Eldesouki, Kareem Darwish
This paper describes the QC-GO team submission to the MADAR Shared Task Subtask 1 (travel domain dialect identification) and Subtask 2 (Twitter user location identification).
no code implementations • WS 2019 • Mohammed Attia, Younes Samih, Ali Elkahky, Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish
When speakers code-switch between their native language and a second language or language variant, they follow a syntactic pattern where words and phrases from the embedded language are inserted into the matrix language.
no code implementations • NAACL 2019 • Hamdy Mubarak, Ahmed Abdelali, Hassan Sajjad, Younes Samih, Kareem Darwish
Arabic text is typically written without short vowels (or diacritics).
no code implementations • 15 Oct 2018 • Ahmed Abdelali, Mohammed Attia, Younes Samih, Kareem Darwish, Hamdy Mubarak
Diacritization process attempt to restore the short vowels in Arabic written text; which typically are omitted.
no code implementations • ACL 2017 • Hassan Sajjad, Fahim Dalvi, Nadir Durrani, Ahmed Abdelali, Yonatan Belinkov, Stephan Vogel
Word segmentation plays a pivotal role in improving any Arabic NLP application.
no code implementations • RANLP 2017 • Irina Temnikova, Ahmed Abdelali, Samy Hedaya, Stephan Vogel, Aishah Al Daher
In this article we run an automatic analysis of a corpus of parallel speeches and their human interpretations, and provide the results of manually annotating the human interpreting strategies in a sample of the corpus.
2 code implementations • 19 Aug 2017 • Mohamed Eldesouki, Younes Samih, Ahmed Abdelali, Mohammed Attia, Hamdy Mubarak, Kareem Darwish, Kallmeyer Laura
Arabic word segmentation is essential for a variety of NLP applications such as machine translation and information retrieval.
Ranked #1 on
Sentiment Analysis
on DynaSent
(10 fold Cross validation metric, using extra
training data)
no code implementations • CONLL 2017 • Younes Samih, Mohamed Eldesouki, Mohammed Attia, Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak, Laura Kallmeyer
Arabic dialects do not just share a common koin{\'e}, but there are shared pan-dialectal linguistic phenomena that allow computational models for dialects to learn from each other.
no code implementations • WS 2017 • Kareem Darwish, Hamdy Mubarak, Ahmed Abdelali
In this paper, we present a new and fast state-of-the-art Arabic diacritizer that guesses the diacritics of words and then their case endings.
no code implementations • WS 2017 • Younes Samih, Mohammed Attia, Mohamed Eldesouki, Ahmed Abdelali, Hamdy Mubarak, Laura Kallmeyer, Kareem Darwish
The automated processing of Arabic Dialects is challenging due to the lack of spelling standards and to the scarcity of annotated data and resources in general.
no code implementations • WS 2017 • Kareem Darwish, Hamdy Mubarak, Ahmed Abdelali, Mohamed Eldesouki
However, we show that augmenting bi-LSTM sequence labeling with some of the features that we used for the SVM-Rank based tagger yields to further improvements.
no code implementations • EACL 2017 • Fahim Dalvi, Yifan Zhang, Sameer Khurana, Nadir Durrani, Hassan Sajjad, Ahmed Abdelali, Hamdy Mubarak, Ahmed Ali, Stephan Vogel
This paper presents QCRI{'}s Arabic-to-English live speech translation system.
no code implementations • EACL 2017 • Renars Liepins, Ulrich Germann, Guntis Barzdins, Alex Birch, ra, Steve Renals, Susanne Weber, Peggy van der Kreeft, Herv{\'e} Bourlard, Jo{\~a}o Prieto, Ond{\v{r}}ej Klejch, Peter Bell, Alex Lazaridis, ros, Alfonso Mendes, Sebastian Riedel, Mariana S. C. Almeida, Pedro Balage, Shay B. Cohen, Tomasz Dwojak, Philip N. Garner, Andreas Giefer, Marcin Junczys-Dowmunt, Hina Imran, David Nogueira, Ahmed Ali, Mir, Sebasti{\~a}o a, Andrei Popescu-Belis, Lesly Miculicich Werlen, Nikos Papasarantopoulos, Abiola Obamuyide, Clive Jones, Fahim Dalvi, Andreas Vlachos, Yang Wang, Sibo Tong, Rico Sennrich, Nikolaos Pappas, Shashi Narayan, Marco Damonte, Nadir Durrani, Sameer Khurana, Ahmed Abdelali, Hassan Sajjad, Stephan Vogel, David Sheppey, Chris Hernon, Jeff Mitchell
We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • COLING 2016 • Nadir Durrani, Hassan Sajjad, Shafiq Joty, Ahmed Abdelali
We present a novel fusion model for domain adaptation in Statistical Machine Translation.
no code implementations • 18 Jun 2016 • Hassan Sajjad, Nadir Durrani, Francisco Guzman, Preslav Nakov, Ahmed Abdelali, Stephan Vogel, Wael Salloum, Ahmed El Kholy, Nizar Habash
The competition focused on informal dialectal Arabic, as used in SMS, chat, and speech.
no code implementations • LREC 2016 • Hamdy Mubarak, Ahmed Abdelali
We present a novel approach for mining data from Twitter for the purpose of building transliteration resources and systems.
no code implementations • LREC 2014 • Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak
This paper presents an end-to-end automatic processing system for Arabic.
no code implementations • LREC 2014 • Ahmed Abdelali, Francisco Guzman, Hassan Sajjad, Stephan Vogel
This paper presents the AMARA corpus of on-line educational content: a new parallel corpus of educational video subtitles, multilingually aligned for 20 languages, i. e. 20 monolingual corpora and 190 parallel corpora.