no code implementations • 28 Aug 2023 • Salah Zaiem, Youcef Kemiche, Titouan Parcollet, Slim Essid, Mirco Ravanelli
Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach impressive performance with reduced amounts of annotated data.
1 code implementation • 30 Jun 2023 • Giuseppe Alessio D'Inverno, Simone Brugiapaglia, Mirco Ravanelli
They are usually based on a message-passing mechanism and have gained increasing popularity for their intuitive formulation, which is closely linked to the Weisfeiler-Lehman (WL) test for graph isomorphism to which they have been proven equivalent in terms of expressive power.
1 code implementation • 22 Jun 2023 • Yingzhi Wang, Mirco Ravanelli, Alaa Nfissi, Alya Yacoubi
Speech Emotion Recognition (SER) typically relies on utterance-level solutions.
no code implementations • 6 Jun 2023 • Sangeet Sagar, Mirco Ravanelli, Bernd Kiefer, Ivana Kruijff Korbayova, Josef van Genabith
Despite the recent advancements in speech recognition, there are still difficulties in accurately transcribing conversational and emotional speech in noisy and reverberant acoustic environments.
1 code implementation • 1 Jun 2023 • Salah Zaiem, Youcef Kemiche, Titouan Parcollet, Slim Essid, Mirco Ravanelli
Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled speech signals to reach impressive performance on speech tasks using only small amounts of annotated data.
1 code implementation • CVPR 2023 • AmirMohammad Sarfi, Zahra Karimpour, Muawiz Chaudhary, Nasir M. Khalid, Mirco Ravanelli, Sudhir Mudur, Eugene Belilovsky
Our principal innovation in this work is to use Simulated annealing in EArly Layers (SEAL) of the network in place of re-initialization of later layers.
no code implementations • 22 Mar 2023 • Francesco Paissan, Cem Subakan, Mirco Ravanelli
In this paper, we introduce a new approach, called Posthoc Interpretation via Quantization (PIQ), for interpreting decisions made by trained classifiers.
1 code implementation • 12 Mar 2023 • Salah Zaiem, Robin Algayres, Titouan Parcollet, Slim Essid, Mirco Ravanelli
Self-supervised learning (SSL) has allowed substantial progress in Automatic Speech Recognition (ASR) performance in low-resource settings.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 27 Jul 2022 • Artem Ploujnikov, Mirco Ravanelli
End-to-end speech synthesis models directly convert the input characters into an audio representation (e. g., spectrograms).
1 code implementation • 19 Jun 2022 • Cem Subakan, Mirco Ravanelli, Samuele Cornell, Frédéric Lepoutre, François Grondin
Transformers have recently achieved state-of-the-art performance in speech separation.
1 code implementation • 15 May 2022 • Zhepei Wang, Cem Subakan, Xilin Jiang, Junkai Wu, Efthymios Tzinis, Mirco Ravanelli, Paris Smaragdis
In this paper, we work on a sound recognition system that continually incorporates new sound classes.
1 code implementation • 6 Feb 2022 • Cem Subakan, Mirco Ravanelli, Samuele Cornell, Francois Grondin, Mirko Bronzi
In particular, we extend our previous findings on the SepFormer by providing results on more challenging noisy and noisy-reverberant datasets, such as LibriMix, WHAM!, and WHAMR!.
Ranked #1 on
Speech Enhancement
on WHAM!
no code implementations • 10 Nov 2021 • Cheng Yu, Szu-Wei Fu, Tsun-An Hsieh, Yu Tsao, Mirco Ravanelli
Although deep learning (DL) has achieved notable progress in speech enhancement (SE), further research is still required for a DL-based SE system to adapt effectively and efficiently to particular speakers.
1 code implementation • 20 Oct 2021 • Cem Subakan, Mirco Ravanelli, Samuele Cornell, François Grondin
First, we release the REAL-M dataset, a crowd-sourced corpus of real-life mixtures.
1 code implementation • 12 Oct 2021 • Szu-Wei Fu, Cheng Yu, Kuo-Hsuan Hung, Mirco Ravanelli, Yu Tsao
Most of the deep learning-based speech enhancement models are learned in a supervised manner, which implies that pairs of noisy and clean speech are required during training.
1 code implementation • 18 Jul 2021 • Juan Manuel Mayor-Torres, Mirco Ravanelli, Sara E. Medina-DeVilliers, Matthew D. Lerner, Giuseppe Riccardi
This result is consistent with recent neuroscience studies on emotion recognition, which found an association between these band suppressions and the behavioral deficits observed in individuals with ASD.
4 code implementations • 8 Jun 2021 • Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, Ju-chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng Liao, Elena Rastorgueva, François Grondin, William Aris, Hwidong Na, Yan Gao, Renato de Mori, Yoshua Bengio
SpeechBrain is an open-source and all-in-one speech toolkit.
3 code implementations • 8 Apr 2021 • Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli, Xugang Lu, Yu Tsao
The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory.
Ranked #11 on
Speech Enhancement
on VoiceBank + DEMAND
1 code implementation • 4 Apr 2021 • Loren Lugosch, Piyush Papreja, Mirco Ravanelli, Abdelwahab Heba, Titouan Parcollet
This paper introduces Timers and Such, a new open source dataset of spoken English commands for common voice control use cases involving numbers.
Ranked #4 on
Spoken Language Understanding
on Timers and Such
(using extra training data)
no code implementations • 3 Apr 2021 • Nauman Dawalatabad, Mirco Ravanelli, François Grondin, Jenthe Thienpondt, Brecht Desplanques, Hwidong Na
Learning robust speaker embeddings is a crucial step in speaker diarization.
no code implementations • 27 Feb 2021 • Alex Lamb, Di He, Anirudh Goyal, Guolin Ke, Chien-Feng Liao, Mirco Ravanelli, Yoshua Bengio
In this work we explore a way in which the Transformer architecture is deficient: it represents each position with a large monolithic hidden representation and a single set of parameters which are applied over the entire hidden representation.
3 code implementations • 25 Oct 2020 • Cem Subakan, Mirco Ravanelli, Samuele Cornell, Mirko Bronzi, Jianyuan Zhong
Transformers are emerging as a natural alternative to standard RNNs, replacing recurrent computations with a multi-head attention mechanism.
Ranked #5 on
Speech Separation
on WSJ0-2mix
1 code implementation • 19 Oct 2020 • François Grondin, Jean-Samuel Lauzon, Simon Michaud, Mirco Ravanelli, François Michaud
This paper introduces BIRD, the Big Impulse Response Dataset.
Sound Audio and Speech Processing
1 code implementation • 18 May 2020 • Xinchi Qiu, Titouan Parcollet, Mirco Ravanelli, Nicholas Lane, Mohamed Morchid
In this paper, we propose to capture these inter- and intra- structural dependencies with quaternion neural networks, which can jointly process multiple signals as whole quaternion entities.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 25 Jan 2020 • Mirco Ravanelli, Jianyuan Zhong, Santiago Pascual, Pawel Swietojanski, Joao Monteiro, Jan Trmal, Yoshua Bengio
We then propose a revised encoder that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks.
2 code implementations • 21 Oct 2019 • Loren Lugosch, Brett Meyer, Derek Nowrouzezahrai, Mirco Ravanelli
End-to-end models are an attractive new approach to spoken language understanding (SLU) in which the meaning of an utterance is inferred directly from the raw audio without employing the standard pipeline composed of a separately trained speech recognizer and natural language understanding module.
Ranked #7 on
Spoken Language Understanding
on Snips-SmartLights
1 code implementation • 25 Sep 2019 • Chiheb Trabelsi, Olexa Bilaniuk, Ousmane Dia, Ying Zhang, Mirco Ravanelli, Jonathan Binas, Negar Rostamzadeh, Christopher J Pal
Using the Wall Street Journal Dataset, we compare our phase-aware loss to several others that operate both in the time and frequency domains and demonstrate the effectiveness of our proposed signal extraction method and proposed loss.
no code implementations • NeurIPS Workshop Deep_Invers 2019 • Chiheb Trabelsi, Olexa Bilaniuk, Ousmane Dia, Ying Zhang, Mirco Ravanelli, Jonathan Binas, Negar Rostamzadeh, Christopher J Pal
Building on recent advances, we propose a new deep complex-valued method for signal retrieval and extraction in the frequency domain.
1 code implementation • 7 Apr 2019 • Loren Lugosch, Mirco Ravanelli, Patrick Ignoto, Vikrant Singh Tomar, Yoshua Bengio
Whereas conventional spoken language understanding (SLU) systems map speech to text, and then text to intent, end-to-end SLU systems map speech directly to intent through a single trainable model.
Ranked #14 on
Spoken Language Understanding
on Fluent Speech Commands
(using extra training data)
1 code implementation • 6 Apr 2019 • Santiago Pascual, Mirco Ravanelli, Joan Serrà, Antonio Bonafonte, Yoshua Bengio
Learning good representations without supervision is still an open issue in machine learning, and is particularly challenging for speech signals, which are often characterized by long sequences with a complex hierarchical structure.
Ranked #2 on
Distant Speech Recognition
on DIRHA English WSJ
2 code implementations • 13 Dec 2018 • Mirco Ravanelli, Yoshua Bengio
Deep neural networks can learn complex and abstract representations, that are progressively obtained by combining simpler ones.
2 code implementations • 1 Dec 2018 • Mirco Ravanelli, Yoshua Bengio
Mutual Information (MI) or similar measures of statistical dependence are promising tools for learning these representations in an unsupervised way.
1 code implementation • 23 Nov 2018 • Mirco Ravanelli, Yoshua Bengio
Deep learning is currently playing a crucial role toward higher levels of artificial intelligence.
Ranked #3 on
Distant Speech Recognition
on DIRHA English WSJ
no code implementations • 21 Nov 2018 • Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Georges Linarès, Renato de Mori
Neural network architectures are at the core of powerful automatic speech recognition systems (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
11 code implementations • 19 Nov 2018 • Mirco Ravanelli, Titouan Parcollet, Yoshua Bengio
Experiments, that are conducted on several datasets and tasks, show that PyTorch-Kaldi can effectively be used to develop modern state-of-the-art speech recognizers.
Ranked #1 on
Distant Speech Recognition
on DIRHA English WSJ
25 code implementations • 29 Jul 2018 • Mirco Ravanelli, Yoshua Bengio
Rather than employing standard hand-crafted features, the latter CNNs learn low-level speech representations from waveforms, potentially allowing the network to better capture important narrow-band speaker characteristics such as pitch and formants.
3 code implementations • ICLR 2019 • Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Georges Linarès, Chiheb Trabelsi, Renato de Mori, Yoshua Bengio
Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due to their capability to learn short and long-term dependencies between the basic elements of a sequence.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 26 May 2018 • Mirco Ravanelli, Maurizio Omologo
Distant speech recognition is being revolutionized by deep learning, that has contributed to significantly outperform previous HMM-GMM systems.
1 code implementation • 15 Apr 2018 • Mirco Ravanelli, Dmitriy Serdyuk, Yoshua Bengio
Online speech recognition is crucial for developing natural human-machine interfaces.
1 code implementation • 26 Mar 2018 • Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio
A field that has directly benefited from the recent advances in deep learning is Automatic Speech Recognition (ASR).
Ranked #6 on
Speech Recognition
on TIMIT
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 17 Dec 2017 • Mirco Ravanelli
Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence.
1 code implementation • 26 Nov 2017 • Mirco Ravanelli, Piergiorgio Svaizer, Maurizio Omologo
The availability of realistic simulated corpora is of key importance for the future progress of distant speech recognition technology.
Audio and Speech Processing Sound
1 code implementation • 10 Oct 2017 • Mirco Ravanelli, Maurizio Omologo
Despite the significant progress made in the last years, state-of-the-art speech recognition technologies provide a satisfactory performance only in the close-talking condition.
2 code implementations • 6 Oct 2017 • Mirco Ravanelli, Maurizio Omologo
This paper introduces the contents and the possible usage of the DIRHA-ENGLISH multi-microphone corpus, recently realized under the EC DIRHA project.
1 code implementation • 29 Sep 2017 • Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio
First, we suggest to remove the reset gate in the GRU design, resulting in a more efficient single-gate architecture.
no code implementations • 24 Mar 2017 • Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio
Improving distant speech recognition is a crucial step towards flexible human-machine interfaces.
no code implementations • 23 Mar 2017 • Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio
Despite the remarkable progress recently made in distant speech recognition, state-of-the-art technology still suffers from a lack of robustness, especially when adverse acoustic conditions characterized by non-stationary noises and reverberation are met.
no code implementations • LREC 2014 • Luca Cristoforetti, Mirco Ravanelli, Maurizio Omologo, Aless Sosi, ro, Alberto Abad, Martin Hagmueller, Petros Maragos
This paper describes a multi-microphone multi-language acoustic corpus being developed under the EC project Distant-speech Interaction for Robust Home Applications (DIRHA).