Search Results for author: Samuele Cornell

Found 27 papers, 13 papers with code

Attention is All You Need in Speech Separation

4 code implementations • 25 Oct 2020 • Cem Subakan, Mirco Ravanelli, Samuele Cornell, Mirko Bronzi, Jianyuan Zhong

Transformers are emerging as a natural alternative to standard RNNs, replacing recurrent computations with a multi-head attention mechanism.

Ranked #7 on Speech Separation on WSJ0-3mix

Speech Separation

7,879

Paper
Code

SpeechBrain: A General-Purpose Speech Toolkit

4 code implementations • 8 Jun 2021 • Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, Ju-chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng Liao, Elena Rastorgueva, François Grondin, William Aris, Hwidong Na, Yan Gao, Renato de Mori, Yoshua Bengio

SpeechBrain is an open-source and all-in-one speech toolkit.

Language Identification Spoken Language Understanding

7,879

Paper
Code

Exploring Self-Attention Mechanisms for Speech Separation

1 code implementation • 6 Feb 2022 • Cem Subakan, Mirco Ravanelli, Samuele Cornell, Francois Grondin, Mirko Bronzi

In particular, we extend our previous findings on the SepFormer by providing results on more challenging noisy and noisy-reverberant datasets, such as LibriMix, WHAM!, and WHAMR!.

Ranked #1 on Speech Enhancement on WHAM!

Denoising Speech Enhancement +1

7,879

Paper
Code

Resource-Efficient Separation Transformer

1 code implementation • 19 Jun 2022 • Luca Della Libera, Cem Subakan, Mirco Ravanelli, Samuele Cornell, Frédéric Lepoutre, François Grondin

Transformers have recently achieved state-of-the-art performance in speech separation.

Speech Separation

7,879

Paper
Code

REAL-M: Towards Speech Separation on Real Mixtures

1 code implementation • 20 Oct 2021 • Cem Subakan, Mirco Ravanelli, Samuele Cornell, François Grondin

First, we release the REAL-M dataset, a crowd-sourced corpus of real-life mixtures.

Open-Ended Question Answering Speech Separation

7,878

Paper
Code

ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding

1 code implementation • 19 Jul 2022 • Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe

To showcase such integration, we performed experiments on carefully designed synthetic datasets for noisy-reverberant multi-channel ST and SLU tasks, which can be used as benchmark corpora for future research.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

7,871

Paper
Code

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

1 code implementation • 27 Oct 2023 • Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell, Sean Kim, Stavros Petridis

TorchAudio is an open-source audio and speech processing library built for PyTorch.

Self-Supervised Learning Speech Enhancement +2

2,379

Paper
Code

Filterbank design for end-to-end speech separation

2 code implementations • 23 Oct 2019 • Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent

Also, we validate the use of parameterized filterbanks and show that complex-valued representations and masks are beneficial in all conditions.

Speaker Recognition Speech Separation

2,107

Paper
Code

LibriMix: An Open-Source Dataset for Generalizable Speech Separation

5 code implementations • 22 May 2020 • Joris Cosentino, Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent

Most deep learning-based speech separation models today are benchmarked on it.

Audio and Speech Processing

2,107

Paper
Code

The impact of non-target events in synthetic soundscapes for sound event detection

1 code implementation • 28 Sep 2021 • Francesca Ronchini, Romain Serizel, Nicolas Turpault, Samuele Cornell

Detection and Classification Acoustic Scene and Events Challenge 2021 Task 4 uses a heterogeneous dataset that includes both recorded and synthetic soundscapes.

Event Detection Sound Event Detection

111

Paper
Code

Learning to Rank Microphones for Distant Speech Recognition

1 code implementation • 6 Apr 2021 • Samuele Cornell, Alessio Brutti, Marco Matassoni, Stefano Squartini

Fully exploiting ad-hoc microphone networks for distant speech recognition is still an open issue.

Distant Speech Recognition Learning-To-Rank +1

Paper
Code

Low-Latency Speech Separation Guided Diarization for Telephone Conversations

1 code implementation • 5 Apr 2022 • Giovanni Morrone, Samuele Cornell, Desh Raj, Luca Serafini, Enrico Zovato, Alessio Brutti, Stefano Squartini

In particular, we compare two low-latency speech separation models.

Action Detection Activity Detection +5

Paper
Code

A Time-Frequency Generative Adversarial based method for Audio Packet Loss Concealment

1 code implementation • 28 Jul 2023 • Carlo Aironi, Samuele Cornell, Luca Serafini, Stefano Squartini

Packet loss is a major cause of voice quality degradation in VoIP transmissions with serious impact on intelligibility and user experience.

Image-to-Image Translation Packet Loss Concealment +1

Paper
Code

The Speed Submission to DIHARD II: Contributions & Lessons Learned

no code implementations • 6 Nov 2019 • Md Sahidullah, Jose Patino, Samuele Cornell, Ruiqing Yin, Sunit Sivasankaran, Hervé Bredin, Pavel Korshunov, Alessio Brutti, Romain Serizel, Emmanuel Vincent, Nicholas Evans, Sébastien Marcel, Stefano Squartini, Claude Barras

This paper describes the speaker diarization systems developed for the Second DIHARD Speech Diarization Challenge (DIHARD II) by the Speed team.

Action Detection Activity Detection +4

Paper
Add Code

Learning Filterbanks for End-to-End Acoustic Beamforming

no code implementations • 8 Nov 2021 • Samuele Cornell, Manuel Pariente, François Grondin, Stefano Squartini

We perform a detailed analysis using the recent Clarity Challenge data and show that by using learnt filterbanks it is possible to surpass oracle-mask based beamforming for short windows.

Paper
Add Code

Implicit Acoustic Echo Cancellation for Keyword Spotting and Device-Directed Speech Detection

no code implementations • 20 Nov 2021 • Samuele Cornell, Thomas Balestri, Thibaud Sénéchal

In many speech-enabled human-machine interaction scenarios, user speech can overlap with the device playback audio.

Acoustic echo cancellation Keyword Spotting

Paper
Add Code

Towards Low-distortion Multi-channel Speech Enhancement: The ESPNet-SE Submission to The L3DAS22 Challenge

no code implementations • 24 Feb 2022 • Yen-Ju Lu, Samuele Cornell, Xuankai Chang, Wangyou Zhang, Chenda Li, Zhaoheng Ni, Zhong-Qiu Wang, Shinji Watanabe

This paper describes our submission to the L3DAS22 Challenge Task 1, which consists of speech enhancement with 3D Ambisonic microphones.

Speech Enhancement

Paper
Add Code

Conversational Speech Separation: an Evaluation Study for Streaming Applications

no code implementations • 31 May 2022 • Giovanni Morrone, Samuele Cornell, Enrico Zovato, Alessio Brutti, Stefano Squartini

Continuous speech separation (CSS) is a recently proposed framework which aims at separating each speaker from an input mixture signal in a streaming fashion.

Speech Separation

Paper
Add Code

Description and analysis of novelties introduced in DCASE Task 4 2022 on the baseline system

no code implementations • 14 Oct 2022 • Francesca Ronchini, Samuele Cornell, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Daniel P. W. Ellis

The aim of the Detection and Classification of Acoustic Scenes and Events Challenge Task 4 is to evaluate systems for the detection of sound events in domestic environments using an heterogeneous dataset.

Event Segmentation

Paper
Add Code

Graph-based Representation of Audio signals for Sound Event Classification

no code implementations • 2021 29th European Signal Processing Conference (EUSIPCO) 2021 • Carlo Aironi, Samuele Cornell, Emanuele Principi, Stefano Squartini

In recent years there has been a considerable rise in interest towards Graph Representation and Learning techniques, especially in such cases where data has intrinsically a graph- like structure: social networks, molecular lattices, or semantic interactions, just to name a few.

Paper
Add Code

Multi-Channel Target Speaker Extraction with Refinement: The WavLab Submission to the Second Clarity Enhancement Challenge

no code implementations • 15 Feb 2023 • Samuele Cornell, Zhong-Qiu Wang, Yoshiki Masuyama, Shinji Watanabe, Manuel Pariente, Nobutaka Ono

To address the challenges encountered in the CEC2 setting, we introduce four major novelties: (1) we extend the state-of-the-art TF-GridNet model, originally designed for monaural speaker separation, for multi-channel, causal speech enhancement, and large improvements are observed by replacing the TCNDenseNet used in iNeuBe with this new architecture; (2) we leverage a recent dual window size approach with future-frame prediction to ensure that iNueBe-X satisfies the 5 ms constraint on algorithmic latency required by CEC2; (3) we introduce a novel speaker-conditioning branch for TF-GridNet to achieve target speaker extraction; (4) we propose a fine-tuning step, where we compute an additional loss with respect to the target speaker signal compensated with the listener audiogram.

Speaker Separation Speech Enhancement +1

Paper
Add Code

End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone Conversations

no code implementations • 21 Mar 2023 • Giovanni Morrone, Samuele Cornell, Luca Serafini, Enrico Zovato, Alessio Brutti, Stefano Squartini

Finally, we also show that the separated signals can be readily used also for automatic speech recognition, reaching performance close to using oracle sources in some configurations.

Action Detection Activity Detection +4

Paper
Add Code

Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling

no code implementations • 18 Apr 2023 • Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe

We propose FSB-LSTM, a novel long short-term memory (LSTM) based architecture that integrates full- and sub-band (FSB) modeling, for single- and multi-channel speech enhancement in the short-time Fourier transform (STFT) domain.

Speech Enhancement

Paper
Add Code

An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings

no code implementations • 29 May 2023 • Luca Serafini, Samuele Cornell, Giovanni Morrone, Enrico Zovato, Alessio Brutti, Stefano Squartini

We found that, among all methods considered, EEND-vector clustering (EEND-VC) offers the best trade-off in terms of computing requirements and performance.

Clustering speaker-diarization +4

Paper
Add Code

The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios

no code implementations • 23 Jun 2023 • Samuele Cornell, Matthew Wiesner, Shinji Watanabe, Desh Raj, Xuankai Chang, Paola Garcia, Matthew Maciejewski, Yoshiki Masuyama, Zhong-Qiu Wang, Stefano Squartini, Sanjeev Khudanpur

The CHiME challenges have played a significant role in the development and evaluation of robust automatic speech recognition (ASR) systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation

no code implementations • 23 Jul 2023 • Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, Shinji Watanabe

In detail, we explore multi-channel separation methods, mask-based beamforming and complex spectral mapping, as well as the best features to use in the ASR back-end model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech Recognition

no code implementations • 2 Oct 2023 • Samuele Cornell, Jee-weon Jung, Shinji Watanabe, Stefano Squartini

This paper presents a novel framework for joint speaker diarization (SD) and automatic speech recognition (ASR), named SLIDAR (sliding-window diarization-augmented recognition).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.