1 code implementation • 28 Jul 2023 • Carlo Aironi, Samuele Cornell, Luca Serafini, Stefano Squartini
Packet loss is a major cause of voice quality degradation in VoIP transmissions with serious impact on intelligibility and user experience.
no code implementations • 23 Jul 2023 • Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, Shinji Watanabe
In detail, we explore multi-channel separation methods, mask-based beamforming and complex spectral mapping, as well as the best features to use in the ASR back-end model.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 23 Jun 2023 • Samuele Cornell, Matthew Wiesner, Shinji Watanabe, Desh Raj, Xuankai Chang, Paola Garcia, Matthew Maciejewski, Yoshiki Masuyama, Zhong-Qiu Wang, Stefano Squartini, Sanjeev Khudanpur
The CHiME challenges have played a significant role in the development and evaluation of robust automatic speech recognition (ASR) systems.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 29 May 2023 • Luca Serafini, Samuele Cornell, Giovanni Morrone, Enrico Zovato, Alessio Brutti, Stefano Squartini
We found that, among all methods considered, EEND-vector clustering (EEND-VC) offers the best trade-off in terms of computing requirements and performance.
no code implementations • 18 Apr 2023 • Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe
We propose FSB-LSTM, a novel long short-term memory (LSTM) based architecture that integrates full- and sub-band (FSB) modeling, for single- and multi-channel speech enhancement in the short-time Fourier transform (STFT) domain.
no code implementations • 21 Mar 2023 • Giovanni Morrone, Samuele Cornell, Luca Serafini, Enrico Zovato, Alessio Brutti, Stefano Squartini
Finally, we also show that the separated signals can be readily used also for automatic speech recognition, reaching performance close to using oracle sources in some configurations.
no code implementations • 15 Feb 2023 • Samuele Cornell, Zhong-Qiu Wang, Yoshiki Masuyama, Shinji Watanabe, Manuel Pariente, Nobutaka Ono
To address the challenges encountered in the CEC2 setting, we introduce four major novelties: (1) we extend the state-of-the-art TF-GridNet model, originally designed for monaural speaker separation, for multi-channel, causal speech enhancement, and large improvements are observed by replacing the TCNDenseNet used in iNeuBe with this new architecture; (2) we leverage a recent dual window size approach with future-frame prediction to ensure that iNueBe-X satisfies the 5 ms constraint on algorithmic latency required by CEC2; (3) we introduce a novel speaker-conditioning branch for TF-GridNet to achieve target speaker extraction; (4) we propose a fine-tuning step, where we compute an additional loss with respect to the target speaker signal compensated with the listener audiogram.
no code implementations • 14 Oct 2022 • Francesca Ronchini, Samuele Cornell, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Daniel P. W. Ellis
The aim of the Detection and Classification of Acoustic Scenes and Events Challenge Task 4 is to evaluate systems for the detection of sound events in domestic environments using an heterogeneous dataset.
1 code implementation • 19 Jul 2022 • Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe
To showcase such integration, we performed experiments on carefully designed synthetic datasets for noisy-reverberant multi-channel ST and SLU tasks, which can be used as benchmark corpora for future research.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
1 code implementation • 19 Jun 2022 • Cem Subakan, Mirco Ravanelli, Samuele Cornell, Frédéric Lepoutre, François Grondin
Transformers have recently achieved state-of-the-art performance in speech separation.
no code implementations • 31 May 2022 • Giovanni Morrone, Samuele Cornell, Enrico Zovato, Alessio Brutti, Stefano Squartini
Continuous speech separation (CSS) is a recently proposed framework which aims at separating each speaker from an input mixture signal in a streaming fashion.
1 code implementation • 5 Apr 2022 • Giovanni Morrone, Samuele Cornell, Desh Raj, Luca Serafini, Enrico Zovato, Alessio Brutti, Stefano Squartini
In particular, we compare two low-latency speech separation models.
no code implementations • 24 Feb 2022 • Yen-Ju Lu, Samuele Cornell, Xuankai Chang, Wangyou Zhang, Chenda Li, Zhaoheng Ni, Zhong-Qiu Wang, Shinji Watanabe
This paper describes our submission to the L3DAS22 Challenge Task 1, which consists of speech enhancement with 3D Ambisonic microphones.
1 code implementation • 6 Feb 2022 • Cem Subakan, Mirco Ravanelli, Samuele Cornell, Francois Grondin, Mirko Bronzi
In particular, we extend our previous findings on the SepFormer by providing results on more challenging noisy and noisy-reverberant datasets, such as LibriMix, WHAM!, and WHAMR!.
Ranked #1 on
Speech Enhancement
on WHAM!
no code implementations • 20 Nov 2021 • Samuele Cornell, Thomas Balestri, Thibaud Sénéchal
In many speech-enabled human-machine interaction scenarios, user speech can overlap with the device playback audio.
no code implementations • 8 Nov 2021 • Samuele Cornell, Manuel Pariente, François Grondin, Stefano Squartini
We perform a detailed analysis using the recent Clarity Challenge data and show that by using learnt filterbanks it is possible to surpass oracle-mask based beamforming for short windows.
1 code implementation • 20 Oct 2021 • Cem Subakan, Mirco Ravanelli, Samuele Cornell, François Grondin
First, we release the REAL-M dataset, a crowd-sourced corpus of real-life mixtures.
1 code implementation • 28 Sep 2021 • Francesca Ronchini, Romain Serizel, Nicolas Turpault, Samuele Cornell
Detection and Classification Acoustic Scene and Events Challenge 2021 Task 4 uses a heterogeneous dataset that includes both recorded and synthetic soundscapes.
no code implementations • 2021 29th European Signal Processing Conference (EUSIPCO) 2021 • Carlo Aironi, Samuele Cornell, Emanuele Principi, Stefano Squartini
In recent years there has been a considerable rise in interest towards Graph Representation and Learning techniques, especially in such cases where data has intrinsically a graph- like structure: social networks, molecular lattices, or semantic interactions, just to name a few.
4 code implementations • 8 Jun 2021 • Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, Ju-chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng Liao, Elena Rastorgueva, François Grondin, William Aris, Hwidong Na, Yan Gao, Renato de Mori, Yoshua Bengio
SpeechBrain is an open-source and all-in-one speech toolkit.
1 code implementation • 6 Apr 2021 • Samuele Cornell, Alessio Brutti, Marco Matassoni, Stefano Squartini
Fully exploiting ad-hoc microphone networks for distant speech recognition is still an open issue.
3 code implementations • 25 Oct 2020 • Cem Subakan, Mirco Ravanelli, Samuele Cornell, Mirko Bronzi, Jianyuan Zhong
Transformers are emerging as a natural alternative to standard RNNs, replacing recurrent computations with a multi-head attention mechanism.
Ranked #5 on
Speech Separation
on WSJ0-2mix
5 code implementations • 22 May 2020 • Joris Cosentino, Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent
Most deep learning-based speech separation models today are benchmarked on it.
Audio and Speech Processing
no code implementations • 6 Nov 2019 • Md Sahidullah, Jose Patino, Samuele Cornell, Ruiqing Yin, Sunit Sivasankaran, Hervé Bredin, Pavel Korshunov, Alessio Brutti, Romain Serizel, Emmanuel Vincent, Nicholas Evans, Sébastien Marcel, Stefano Squartini, Claude Barras
This paper describes the speaker diarization systems developed for the Second DIHARD Speech Diarization Challenge (DIHARD II) by the Speed team.
2 code implementations • 23 Oct 2019 • Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent
Also, we validate the use of parameterized filterbanks and show that complex-valued representations and masks are beneficial in all conditions.