Search Results for author: Hanan Aldarmaki

Found 31 papers, 14 papers with code

ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis

no code implementations26 May 2025 Hawau Olamide Toyin, Rufael Marew, Humaid Alblooshi, Samar M. Magdy, Hanan Aldarmaki

We introduce ArVoice, a multi-speaker Modern Standard Arabic (MSA) speech corpus with diacritized transcriptions, intended for multi-speaker speech synthesis, and can be useful for other tasks such as speech-based diacritic restoration, voice conversion, and deepfake detection.

DeepFake Detection Face Swapping +2

Voice of a Continent: Mapping Africa's Speech Technology Frontier

no code implementations24 May 2025 AbdelRahim Elmadany, Sang Yun Kwon, Hawau Olamide Toyin, Alcides Alcoba Inciarte, Hanan Aldarmaki, Muhammad Abdul-Mageed

Africa's rich linguistic diversity remains significantly underrepresented in speech technologies, creating barriers to digital inclusion.

Diversity

SPIRIT: Patching Speech Language Models against Jailbreak Attacks

no code implementations18 May 2025 Amirbek Djanibekov, Nurdaulet Mukhituly, Kentaro Inui, Hanan Aldarmaki, Nils Lukas

Speech Language Models (SLMs) enable natural interactions via spoken instructions, which more effectively capture user intent by detecting nuances in speech.

JEEM: Vision-Language Understanding in Four Arabic Dialects

no code implementations27 Mar 2025 Karima Kadaoui, Hanin Atwany, Hamdan Al-Ali, Abdelrahman Mohamed, Ali Mekky, Sergei Tilga, Natalia Fedorova, Ekaterina Artemova, Hanan Aldarmaki, Yova Kementchedjhieva

We introduce JEEM, a benchmark designed to evaluate Vision-Language Models (VLMs) on visual understanding across four Arabic-speaking countries: Jordan, The Emirates, Egypt, and Morocco.

Image Captioning Question Answering +1

Infant Cry Detection Using Causal Temporal Representation

1 code implementation8 Mar 2025 Minghao Fu, Danning Li, Aryan Gadhiya, Benjamin Lambright, Mohamed Alowais, Mohab Bahnassy, Saad El Dine Elletter, Hawau Olamide Toyin, Haiyan Jiang, Kun Zhang, Hanan Aldarmaki

This paper addresses a major challenge in acoustic event detection, in particular infant cry detection in the presence of other sounds and background noises: the lack of precise annotated data.

Event Detection

SparQLe: Speech Queries to Text Translation Through LLMs

1 code implementation13 Feb 2025 Amirbek Djanibekov, Hanan Aldarmaki

With the growing influence of Large Language Models (LLMs), there is increasing interest in integrating speech representations with them to enable more seamless multi-modal processing and speech understanding.

Speech-to-Text Speech-to-Text Translation +1

STTATTS: Unified Speech-To-Text And Text-To-Speech Model

1 code implementation24 Oct 2024 Hawau Olamide Toyin, Hao Li, Hanan Aldarmaki

Speech recognition and speech synthesis models are typically trained separately, each with its own set of learning objectives, training data, and model parameters, resulting in two distinct large networks.

Multi-Task Learning speech-recognition +5

RelUNet: Relative Channel Fusion U-Net for Multichannel Speech Enhancement

no code implementations7 Oct 2024 Ibrahim Aldarmaki, Thamar Solorio, Bhiksha Raj, Hanan Aldarmaki

Neural multi-channel speech enhancement models, in particular those based on the U-Net architecture, demonstrate promising performance and generalization potential.

Speech Enhancement

PALM: Few-Shot Prompt Learning for Audio Language Models

no code implementations29 Sep 2024 Asif Hanif, Maha Tufail Agro, Mohammad Areeb Qazi, Hanan Aldarmaki

We demonstrate the effectiveness of our approach on 11 audio recognition datasets, encompassing a variety of speech-processing tasks, and compare the results with three baselines in a few-shot learning setup.

Few-Shot Learning Prompt Learning

Mixat: A Data Set of Bilingual Emirati-English Speech

1 code implementation4 May 2024 Maryam Al Ali, Hanan Aldarmaki

This paper introduces Mixat: a dataset of Emirati speech code-mixed with English.

speech-recognition Speech Recognition

Spoken Word2Vec: Learning Skipgram Embeddings from Speech

1 code implementation15 Nov 2023 Mohammad Amaan Sayeed, Hanan Aldarmaki

Text word embeddings that encode distributional semantics work by modeling contextual similarities of frequently occurring words.

Clustering Word Embeddings

Automatic Restoration of Diacritics for Speech Data Sets

1 code implementation15 Nov 2023 Sara Shatnawi, Sawsan Alqahtani, Hanan Aldarmaki

Automatic text-based diacritic restoration models generally have high diacritic error rates when applied to speech transcripts as a result of domain and style shifts in spoken language.

ArTST: Arabic Text and Speech Transformer

1 code implementation25 Oct 2023 Hawau Olamide Toyin, Amirbek Djanibekov, Ajinkya Kulkarni, Hanan Aldarmaki

We present ArTST, a pre-trained Arabic text and speech transformer for supporting open-source speech technologies for the Arabic language.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Yet Another Model for Arabic Dialect Identification

no code implementations20 Oct 2023 Ajinkya Kulkarni, Hanan Aldarmaki

We explore two architectural variations: ResNet and ECAPA-TDNN, coupled with two types of acoustic features: MFCCs and features exratected from the pre-trained self-supervised model UniSpeech-SAT Large, as well as a fusion of all four variants.

Dialect Identification model

Adapting the adapters for code-switching in multilingual ASR

1 code implementation11 Oct 2023 Atharva Kulkarni, Ajinkya Kulkarni, Miguel Couceiro, Hanan Aldarmaki

Recently, large pre-trained multilingual speech models have shown potential in scaling Automatic Speech Recognition (ASR) to many low-resource languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Handling Realistic Label Noise in BERT Text Classification

no code implementations23 May 2023 Maha Tufail Agro, Hanan Aldarmaki

Labels noise refers to errors in training labels caused by cheap data annotation methods, such as web scraping or crowd-sourcing, which can be detrimental to the performance of supervised classifiers.

text-classification Text Classification

ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus

no code implementations28 Feb 2023 Ajinkya Kulkarni, Atharva Kulkarni, Sara Abedalmonem Mohammad Shatnawi, Hanan Aldarmaki

In a move towards filling this gap in resources, we present a speech corpus for Classical Arabic Text-to-Speech (ClArTTS) to support the development of end-to-end TTS systems for Arabic.

Speech Synthesis text-to-speech +1

Diacritic Recognition Performance in Arabic ASR

no code implementations27 Feb 2023 Hanan Aldarmaki, Ahmad Ghannam

We present an analysis of diacritic recognition performance in Arabic Automatic Speech Recognition (ASR) systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Supervised Acoustic Embeddings And Their Transferability Across Languages

1 code implementation3 Jan 2023 Sreepratha Ram, Hanan Aldarmaki

In speech recognition, it is essential to model the phonetic content of the input signal while discarding irrelevant factors such as speaker variations and noise, which is challenging in low-resource settings.

speech-recognition Speech Recognition +2

Unsupervised Automatic Speech Recognition: A Review

no code implementations9 Jun 2021 Hanan Aldarmaki, Asad Ullah, Nazar Zaki

Automatic Speech Recognition (ASR) systems can be trained to achieve remarkable performance given large amounts of manually transcribed speech, but large labeled data sets can be difficult or expensive to acquire for all languages of interest.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Homograph Disambiguation Through Selective Diacritic Restoration

no code implementations WS 2019 Sawsan Alqahtani, Hanan Aldarmaki, Mona Diab

Diacritic restoration could theoretically help disambiguate these words, but in practice, the increase in overall sparsity leads to performance degradation in NLP applications.

Machine Translation Part-Of-Speech Tagging +2

Efficient Sentence Embedding using Discrete Cosine Transform

1 code implementation IJCNLP 2019 Nada Almarwani, Hanan Aldarmaki, Mona Diab

Vector averaging remains one of the most popular sentence embedding methods in spite of its obvious disregard for syntactic structure.

Classification General Classification +3

Scalable Cross-Lingual Transfer of Neural Sentence Embeddings

no code implementations SEMEVAL 2019 Hanan Aldarmaki, Mona Diab

We develop and investigate several cross-lingual alignment approaches for neural sentence embedding models, such as the supervised inference classifier, InferSent, and sequential encoder-decoder models.

Cross-Lingual Transfer Decoder +4

Context-Aware Cross-Lingual Mapping

1 code implementation NAACL 2019 Hanan Aldarmaki, Mona Diab

Cross-lingual word vectors are typically obtained by fitting an orthogonal matrix that maps the entries of a bilingual dictionary from a source to a target vector space.

Retrieval Sentence +4

Evaluation of Unsupervised Compositional Representations

1 code implementation COLING 2018 Hanan Aldarmaki, Mona Diab

We evaluated various compositional models, from bag-of-words representations to compositional RNN-based models, on several extrinsic supervised and unsupervised evaluation benchmarks.

General Classification

Unsupervised Word Mapping Using Structural Similarities in Monolingual Embeddings

no code implementations TACL 2018 Hanan Aldarmaki, Mahesh Mohan, Mona Diab

We show empirically that the performance of bilingual correspondents learned using our proposed unsupervised method is comparable to that of using supervised bilingual correspondents from a seed dictionary.

Word Embeddings

Cannot find the paper you are looking for? You can Submit a new open access paper.