no code implementations • 14 Sep 2024 • Masao Someki, Kwanghee Choi, Siddhant Arora, William Chen, Samuele Cornell, Jionghao Han, Yifan Peng, Jiatong Shi, Vaibhav Srivastav, Shinji Watanabe
We introduce ESPnet-EZ, an extension of the open-source speech processing toolkit ESPnet, aimed at quick and easy development of speech models.
1 code implementation • 17 Aug 2024 • Samuele Cornell, Jordan Darefsky, Zhiyao Duan, Shinji Watanabe
In this work, we propose a synthetic data generation pipeline for multi-speaker conversational ASR, leveraging a large language model (LLM) for content creation and a conversational multi-speaker text-to-speech (TTS) model for speech synthesis.
no code implementations • 23 Jul 2024 • Samuele Cornell, Taejin Park, Steve Huang, Christoph Boeddeker, Xuankai Chang, Matthew Maciejewski, Matthew Wiesner, Paola Garcia, Shinji Watanabe
This paper presents the CHiME-8 DASR challenge which carries on from the previous edition CHiME-7 DASR (C7DASR) and the past CHiME-6 challenge.
no code implementations • 19 Jun 2024 • Chenda Li, Samuele Cornell, Shinji Watanabe, Yanmin Qian
In this paper, we propose to use discriminative scores from discriminative models in the first steps of the RDP.
no code implementations • 12 Jun 2024 • Samuele Cornell, Janek Ebbers, Constance Douwes, Irene Martín-Morató, Manu Harju, Annamaria Mesaros, Romain Serizel
The Detection and Classification of Acoustic Scenes and Events Challenge Task 4 aims to advance sound event detection (SED) systems in domestic environments by leveraging training data with different supervision uncertainty.
no code implementations • 7 Jun 2024 • Wangyou Zhang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Chenda Li, Zhaoheng Ni, Anurag Kumar, Jan Pirklbauer, Marvin Sach, Shinji Watanabe, Tim Fingscheidt, Yanmin Qian
The last decade has witnessed significant advancements in deep learning-based speech enhancement (SE).
1 code implementation • 27 Oct 2023 • Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell, Sean Kim, Stavros Petridis
TorchAudio is an open-source audio and speech processing library built for PyTorch.
no code implementations • 2 Oct 2023 • Samuele Cornell, Jee-weon Jung, Shinji Watanabe, Stefano Squartini
This paper presents a novel framework for joint speaker diarization (SD) and automatic speech recognition (ASR), named SLIDAR (sliding-window diarization-augmented recognition).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
1 code implementation • 28 Jul 2023 • Carlo Aironi, Samuele Cornell, Luca Serafini, Stefano Squartini
Packet loss is a major cause of voice quality degradation in VoIP transmissions with serious impact on intelligibility and user experience.
no code implementations • 23 Jul 2023 • Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, Shinji Watanabe
In detail, we explore multi-channel separation methods, mask-based beamforming and complex spectral mapping, as well as the best features to use in the ASR back-end model.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 23 Jun 2023 • Samuele Cornell, Matthew Wiesner, Shinji Watanabe, Desh Raj, Xuankai Chang, Paola Garcia, Matthew Maciejewski, Yoshiki Masuyama, Zhong-Qiu Wang, Stefano Squartini, Sanjeev Khudanpur
The CHiME challenges have played a significant role in the development and evaluation of robust automatic speech recognition (ASR) systems.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 29 May 2023 • Luca Serafini, Samuele Cornell, Giovanni Morrone, Enrico Zovato, Alessio Brutti, Stefano Squartini
We found that, among all methods considered, EEND-vector clustering (EEND-VC) offers the best trade-off in terms of computing requirements and performance.
no code implementations • 18 Apr 2023 • Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe
We propose FSB-LSTM, a novel long short-term memory (LSTM) based architecture that integrates full- and sub-band (FSB) modeling, for single- and multi-channel speech enhancement in the short-time Fourier transform (STFT) domain.
no code implementations • 21 Mar 2023 • Giovanni Morrone, Samuele Cornell, Luca Serafini, Enrico Zovato, Alessio Brutti, Stefano Squartini
Recent works show that speech separation guided diarization (SSGD) is an increasingly promising direction, mainly thanks to the recent progress in speech separation.
no code implementations • 15 Feb 2023 • Samuele Cornell, Zhong-Qiu Wang, Yoshiki Masuyama, Shinji Watanabe, Manuel Pariente, Nobutaka Ono
To address the challenges encountered in the CEC2 setting, we introduce four major novelties: (1) we extend the state-of-the-art TF-GridNet model, originally designed for monaural speaker separation, for multi-channel, causal speech enhancement, and large improvements are observed by replacing the TCNDenseNet used in iNeuBe with this new architecture; (2) we leverage a recent dual window size approach with future-frame prediction to ensure that iNueBe-X satisfies the 5 ms constraint on algorithmic latency required by CEC2; (3) we introduce a novel speaker-conditioning branch for TF-GridNet to achieve target speaker extraction; (4) we propose a fine-tuning step, where we compute an additional loss with respect to the target speaker signal compensated with the listener audiogram.
no code implementations • 14 Oct 2022 • Francesca Ronchini, Samuele Cornell, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Daniel P. W. Ellis
The aim of the Detection and Classification of Acoustic Scenes and Events Challenge Task 4 is to evaluate systems for the detection of sound events in domestic environments using an heterogeneous dataset.
1 code implementation • 19 Jul 2022 • Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe
To showcase such integration, we performed experiments on carefully designed synthetic datasets for noisy-reverberant multi-channel ST and SLU tasks, which can be used as benchmark corpora for future research.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
1 code implementation • 19 Jun 2022 • Luca Della Libera, Cem Subakan, Mirco Ravanelli, Samuele Cornell, Frédéric Lepoutre, François Grondin
Transformers have recently achieved state-of-the-art performance in speech separation.
no code implementations • 31 May 2022 • Giovanni Morrone, Samuele Cornell, Enrico Zovato, Alessio Brutti, Stefano Squartini
Continuous speech separation (CSS) is a recently proposed framework which aims at separating each speaker from an input mixture signal in a streaming fashion.
1 code implementation • 5 Apr 2022 • Giovanni Morrone, Samuele Cornell, Desh Raj, Luca Serafini, Enrico Zovato, Alessio Brutti, Stefano Squartini
In particular, we compare two low-latency speech separation models.
no code implementations • 24 Feb 2022 • Yen-Ju Lu, Samuele Cornell, Xuankai Chang, Wangyou Zhang, Chenda Li, Zhaoheng Ni, Zhong-Qiu Wang, Shinji Watanabe
This paper describes our submission to the L3DAS22 Challenge Task 1, which consists of speech enhancement with 3D Ambisonic microphones.
1 code implementation • 6 Feb 2022 • Cem Subakan, Mirco Ravanelli, Samuele Cornell, Francois Grondin, Mirko Bronzi
In particular, we extend our previous findings on the SepFormer by providing results on more challenging noisy and noisy-reverberant datasets, such as LibriMix, WHAM!, and WHAMR!.
Ranked #1 on
Speech Enhancement
on WHAM!
no code implementations • 20 Nov 2021 • Samuele Cornell, Thomas Balestri, Thibaud Sénéchal
In many speech-enabled human-machine interaction scenarios, user speech can overlap with the device playback audio.
no code implementations • 8 Nov 2021 • Samuele Cornell, Manuel Pariente, François Grondin, Stefano Squartini
We perform a detailed analysis using the recent Clarity Challenge data and show that by using learnt filterbanks it is possible to surpass oracle-mask based beamforming for short windows.
1 code implementation • 20 Oct 2021 • Cem Subakan, Mirco Ravanelli, Samuele Cornell, François Grondin
First, we release the REAL-M dataset, a crowd-sourced corpus of real-life mixtures.
1 code implementation • 28 Sep 2021 • Francesca Ronchini, Romain Serizel, Nicolas Turpault, Samuele Cornell
Detection and Classification Acoustic Scene and Events Challenge 2021 Task 4 uses a heterogeneous dataset that includes both recorded and synthetic soundscapes.
no code implementations • 2021 29th European Signal Processing Conference (EUSIPCO) 2021 • Carlo Aironi, Samuele Cornell, Emanuele Principi, Stefano Squartini
In recent years there has been a considerable rise in interest towards Graph Representation and Learning techniques, especially in such cases where data has intrinsically a graph- like structure: social networks, molecular lattices, or semantic interactions, just to name a few.
4 code implementations • 8 Jun 2021 • Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, Ju-chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng Liao, Elena Rastorgueva, François Grondin, William Aris, Hwidong Na, Yan Gao, Renato de Mori, Yoshua Bengio
SpeechBrain is an open-source and all-in-one speech toolkit.
1 code implementation • 6 Apr 2021 • Samuele Cornell, Alessio Brutti, Marco Matassoni, Stefano Squartini
Fully exploiting ad-hoc microphone networks for distant speech recognition is still an open issue.
4 code implementations • 25 Oct 2020 • Cem Subakan, Mirco Ravanelli, Samuele Cornell, Mirko Bronzi, Jianyuan Zhong
Transformers are emerging as a natural alternative to standard RNNs, replacing recurrent computations with a multi-head attention mechanism.
Ranked #7 on
Speech Separation
on WSJ0-3mix
5 code implementations • 22 May 2020 • Joris Cosentino, Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent
Most deep learning-based speech separation models today are benchmarked on it.
Audio and Speech Processing
no code implementations • 6 Nov 2019 • Md Sahidullah, Jose Patino, Samuele Cornell, Ruiqing Yin, Sunit Sivasankaran, Hervé Bredin, Pavel Korshunov, Alessio Brutti, Romain Serizel, Emmanuel Vincent, Nicholas Evans, Sébastien Marcel, Stefano Squartini, Claude Barras
This paper describes the speaker diarization systems developed for the Second DIHARD Speech Diarization Challenge (DIHARD II) by the Speed team.
2 code implementations • 23 Oct 2019 • Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent
Also, we validate the use of parameterized filterbanks and show that complex-valued representations and masks are beneficial in all conditions.