no code implementations • 8 Mar 2025 • Santiago Cuervo, Adel Moumen, Yanis Labrak, Sameer Khurana, Antoine Laurent, Mickael Rouvier, Ricard Marxer
Text-Speech Language Models (TSLMs) -- language models trained to jointly process and generate text and speech -- aim to enable cross-modal knowledge transfer to overcome the scaling limitations of unimodal speech LMs.
no code implementations • 4 Sep 2024 • Ryan Whetten, Titouan Parcollet, Adel Moumen, Marco Dinarelli, Yannick Estève
Self-Supervised Learning (SSL) has proven to be effective in various domains, including speech processing.
1 code implementation • 30 Aug 2024 • Ada Defne Tur, Adel Moumen, Mirco Ravanelli
Large Language Models (LLMs) have shown their ability to improve the performance of speech recognizers by effectively rescoring the n-best hypotheses generated during the beam search process.
no code implementations • 29 Jun 2024 • Mirco Ravanelli, Titouan Parcollet, Adel Moumen, Sylvain de Langen, Cem Subakan, Peter Plantinga, Yingzhi Wang, Pooneh Mousavi, Luca Della Libera, Artem Ploujnikov, Francesco Paissan, Davide Borra, Salah Zaiem, Zeyu Zhao, Shucong Zhang, Georgios Karakasidis, Sung-Lin Yeh, Pierre Champion, Aku Rouhe, Rudolf Braun, Florian Mai, Juan Zuluaga-Gomez, Seyed Mahed Mousavi, Andreas Nautsch, Xuechen Liu, Sangeet Sagar, Jarod Duret, Salima Mdhaffar, Gaelle Laperriere, Mickael Rouvier, Renato de Mori, Yannick Esteve
This paper presents SpeechBrain 1. 0, a significant milestone in the evolution of the toolkit, which now has over 200 recipes for speech, audio, and language processing tasks, and more than 100 models available on Hugging Face.
no code implementations • 9 Jun 2024 • Yanis Labrak, Adel Moumen, Richard Dufour, Mickael Rouvier
In the rapidly evolving landscape of spoken question-answering (SQA), the integration of large language models (LLMs) has emerged as a transformative development.
no code implementations • 16 Feb 2023 • Adel Moumen, Titouan Parcollet
The light gated recurrent units (Li-GRU) is well-known for achieving impressive results in automatic speech recognition (ASR) tasks while being lighter and faster to train than a standard gated recurrent units (GRU).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1