no code implementations • 26 May 2025 • Pooneh Mousavi, Yingzhi Wang, Mirco Ravanelli, Cem Subakan
A key consideration for these models is the cross-modal alignment between text and audio modalities, which is a telltale sign as to whether or not LLM is able to associate semantic meaning to audio segments.
no code implementations • 19 May 2025 • Yingzhi Wang, Anas Alhmoud, Saad Alsahly, Muhammad Alqurishi, Mirco Ravanelli
Our findings reveal that only 3 of the 20 heads account for over 75% of the hallucinations on the UrbanSound dataset.
1 code implementation • 18 Dec 2024 • Yingzhi Wang, Anas Alhmoud, Muhammad Alqurishi
In recent years, the enhanced capabilities of ASR models and the emergence of multi-dialect datasets have increasingly pushed Arabic ASR model development toward an all-dialect-in-one direction.
1 code implementation • 22 Sep 2024 • Yingzhi Wang, Pooneh Mousavi, Artem Ploujnikov, Mirco Ravanelli
In audio and speech processing, tasks usually focus on either the audio or speech modality, even when both sounds and human speech are present in the same audio clip.
no code implementations • 29 Jun 2024 • Mirco Ravanelli, Titouan Parcollet, Adel Moumen, Sylvain de Langen, Cem Subakan, Peter Plantinga, Yingzhi Wang, Pooneh Mousavi, Luca Della Libera, Artem Ploujnikov, Francesco Paissan, Davide Borra, Salah Zaiem, Zeyu Zhao, Shucong Zhang, Georgios Karakasidis, Sung-Lin Yeh, Pierre Champion, Aku Rouhe, Rudolf Braun, Florian Mai, Juan Zuluaga-Gomez, Seyed Mahed Mousavi, Andreas Nautsch, Xuechen Liu, Sangeet Sagar, Jarod Duret, Salima Mdhaffar, Gaelle Laperriere, Mickael Rouvier, Renato de Mori, Yannick Esteve
This paper presents SpeechBrain 1. 0, a significant milestone in the evolution of the toolkit, which now has over 200 recipes for speech, audio, and language processing tasks, and more than 100 models available on Hugging Face.
3 code implementations • 22 Jun 2023 • Yingzhi Wang, Mirco Ravanelli, Alya Yacoubi
Speech Emotion Recognition (SER) typically relies on utterance-level solutions.
no code implementations • 4 Nov 2021 • Yingzhi Wang, Abdelmoumene Boumadane, Abdelwahab Heba
Speech self-supervised models such as wav2vec 2. 0 and HuBERT are making revolutionary progress in Automatic Speech Recognition (ASR).
Ranked #2 on
Slot Filling
on SLURP
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+8