1 code implementation • 31 May 2025 • Ana Rita Valente, Rufael Marew, Hawau Olamide Toyin, Hamdan Al-Ali, Anelise Bohnen, Inma Becerra, Elsa Marta Soares, Goncalo Leal, Hanan Aldarmaki
Stuttering is a complex disorder that requires specialized expertise for effective assessment and treatment.
no code implementations • 26 May 2025 • Hawau Olamide Toyin, Rufael Marew, Humaid Alblooshi, Samar M. Magdy, Hanan Aldarmaki
We introduce ArVoice, a multi-speaker Modern Standard Arabic (MSA) speech corpus with diacritized transcriptions, intended for multi-speaker speech synthesis, and can be useful for other tasks such as speech-based diacritic restoration, voice conversion, and deepfake detection.
no code implementations • 24 May 2025 • AbdelRahim Elmadany, Sang Yun Kwon, Hawau Olamide Toyin, Alcides Alcoba Inciarte, Hanan Aldarmaki, Muhammad Abdul-Mageed
Africa's rich linguistic diversity remains significantly underrepresented in speech technologies, creating barriers to digital inclusion.
no code implementations • 18 May 2025 • Amirbek Djanibekov, Nurdaulet Mukhituly, Kentaro Inui, Hanan Aldarmaki, Nils Lukas
Speech Language Models (SLMs) enable natural interactions via spoken instructions, which more effectively capture user intent by detecting nuances in speech.
no code implementations • 27 Mar 2025 • Karima Kadaoui, Hanin Atwany, Hamdan Al-Ali, Abdelrahman Mohamed, Ali Mekky, Sergei Tilga, Natalia Fedorova, Ekaterina Artemova, Hanan Aldarmaki, Yova Kementchedjhieva
We introduce JEEM, a benchmark designed to evaluate Vision-Language Models (VLMs) on visual understanding across four Arabic-speaking countries: Jordan, The Emirates, Egypt, and Morocco.
1 code implementation • 8 Mar 2025 • Minghao Fu, Danning Li, Aryan Gadhiya, Benjamin Lambright, Mohamed Alowais, Mohab Bahnassy, Saad El Dine Elletter, Hawau Olamide Toyin, Haiyan Jiang, Kun Zhang, Hanan Aldarmaki
This paper addresses a major challenge in acoustic event detection, in particular infant cry detection in the presence of other sounds and background noises: the lack of precise annotated data.
1 code implementation • 13 Feb 2025 • Amirbek Djanibekov, Hanan Aldarmaki
With the growing influence of Large Language Models (LLMs), there is increasing interest in integrating speech representations with them to enable more seamless multi-modal processing and speech understanding.
1 code implementation • 7 Nov 2024 • Amirbek Djanibekov, Hawau Olamide Toyin, Raghad Alshalan, Abdullah Alitr, Hanan Aldarmaki
Developing robust automatic speech recognition (ASR) systems for Arabic requires effective strategies to manage its diversity.
1 code implementation • 24 Oct 2024 • Hawau Olamide Toyin, Hao Li, Hanan Aldarmaki
Speech recognition and speech synthesis models are typically trained separately, each with its own set of learning objectives, training data, and model parameters, resulting in two distinct large networks.
no code implementations • 7 Oct 2024 • Ibrahim Aldarmaki, Thamar Solorio, Bhiksha Raj, Hanan Aldarmaki
Neural multi-channel speech enhancement models, in particular those based on the U-Net architecture, demonstrate promising performance and generalization potential.
no code implementations • 29 Sep 2024 • Asif Hanif, Maha Tufail Agro, Mohammad Areeb Qazi, Hanan Aldarmaki
We demonstrate the effectiveness of our approach on 11 audio recognition datasets, encompassing a variety of speech-processing tasks, and compare the results with three baselines in a few-shot learning setup.
1 code implementation • 4 May 2024 • Maryam Al Ali, Hanan Aldarmaki
This paper introduces Mixat: a dataset of Emirati speech code-mixed with English.
1 code implementation • 15 Nov 2023 • Mohammad Amaan Sayeed, Hanan Aldarmaki
Text word embeddings that encode distributional semantics work by modeling contextual similarities of frequently occurring words.
1 code implementation • 15 Nov 2023 • Sara Shatnawi, Sawsan Alqahtani, Hanan Aldarmaki
Automatic text-based diacritic restoration models generally have high diacritic error rates when applied to speech transcripts as a result of domain and style shifts in spoken language.
1 code implementation • 25 Oct 2023 • Hawau Olamide Toyin, Amirbek Djanibekov, Ajinkya Kulkarni, Hanan Aldarmaki
We present ArTST, a pre-trained Arabic text and speech transformer for supporting open-source speech technologies for the Arabic language.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+6
no code implementations • 20 Oct 2023 • Ajinkya Kulkarni, Hanan Aldarmaki
We explore two architectural variations: ResNet and ECAPA-TDNN, coupled with two types of acoustic features: MFCCs and features exratected from the pre-trained self-supervised model UniSpeech-SAT Large, as well as a fusion of all four variants.
1 code implementation • 11 Oct 2023 • Atharva Kulkarni, Ajinkya Kulkarni, Miguel Couceiro, Hanan Aldarmaki
Recently, large pre-trained multilingual speech models have shown potential in scaling Automatic Speech Recognition (ASR) to many low-resource languages.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 23 May 2023 • Maha Tufail Agro, Hanan Aldarmaki
Labels noise refers to errors in training labels caused by cheap data annotation methods, such as web scraping or crowd-sourcing, which can be detrimental to the performance of supervised classifiers.
no code implementations • 28 Feb 2023 • Ajinkya Kulkarni, Atharva Kulkarni, Sara Abedalmonem Mohammad Shatnawi, Hanan Aldarmaki
In a move towards filling this gap in resources, we present a speech corpus for Classical Arabic Text-to-Speech (ClArTTS) to support the development of end-to-end TTS systems for Arabic.
no code implementations • 27 Feb 2023 • Hanan Aldarmaki, Ahmad Ghannam
We present an analysis of diacritic recognition performance in Arabic Automatic Speech Recognition (ASR) systems.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 3 Jan 2023 • Sreepratha Ram, Hanan Aldarmaki
In speech recognition, it is essential to model the phonetic content of the input signal while discarding irrelevant factors such as speaker variations and noise, which is challenging in low-resource settings.
no code implementations • 9 Jun 2021 • Hanan Aldarmaki, Asad Ullah, Nazar Zaki
Automatic Speech Recognition (ASR) systems can be trained to achieve remarkable performance given large amounts of manually transcribed speech, but large labeled data sets can be difficult or expensive to acquire for all languages of interest.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • WS 2019 • Sawsan Alqahtani, Hanan Aldarmaki, Mona Diab
Diacritic restoration could theoretically help disambiguate these words, but in practice, the increase in overall sparsity leads to performance degradation in NLP applications.
1 code implementation • IJCNLP 2019 • Nada Almarwani, Hanan Aldarmaki, Mona Diab
Vector averaging remains one of the most popular sentence embedding methods in spite of its obvious disregard for syntactic structure.
no code implementations • SEMEVAL 2019 • Hanan Aldarmaki, Mona Diab
We develop and investigate several cross-lingual alignment approaches for neural sentence embedding models, such as the supervised inference classifier, InferSent, and sequential encoder-decoder models.
1 code implementation • NAACL 2019 • Hanan Aldarmaki, Mona Diab
Cross-lingual word vectors are typically obtained by fitting an orthogonal matrix that maps the entries of a bilingual dictionary from a source to a target vector space.
1 code implementation • COLING 2018 • Hanan Aldarmaki, Mona Diab
We evaluated various compositional models, from bag-of-words representations to compositional RNN-based models, on several extrinsic supervised and unsupervised evaluation benchmarks.
no code implementations • TACL 2018 • Hanan Aldarmaki, Mahesh Mohan, Mona Diab
We show empirically that the performance of bilingual correspondents learned using our proposed unsupervised method is comparable to that of using supervised bilingual correspondents from a seed dictionary.