Search Results for author: Samuele Cornell

Found 33 papers, 14 papers with code

ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration

no code implementations14 Sep 2024 Masao Someki, Kwanghee Choi, Siddhant Arora, William Chen, Samuele Cornell, Jionghao Han, Yifan Peng, Jiatong Shi, Vaibhav Srivastav, Shinji Watanabe

We introduce ESPnet-EZ, an extension of the open-source speech processing toolkit ESPnet, aimed at quick and easy development of speech models.

Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition

1 code implementation17 Aug 2024 Samuele Cornell, Jordan Darefsky, Zhiyao Duan, Shinji Watanabe

In this work, we propose a synthetic data generation pipeline for multi-speaker conversational ASR, leveraging a large language model (LLM) for content creation and a conversational multi-speaker text-to-speech (TTS) model for speech synthesis.

Language Modeling Language Modelling +6

Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement

no code implementations19 Jun 2024 Chenda Li, Samuele Cornell, Shinji Watanabe, Yanmin Qian

In this paper, we propose to use discriminative scores from discriminative models in the first steps of the RDP.

Speech Enhancement

DCASE 2024 Task 4: Sound Event Detection with Heterogeneous Data and Missing Labels

no code implementations12 Jun 2024 Samuele Cornell, Janek Ebbers, Constance Douwes, Irene Martín-Morató, Manu Harju, Annamaria Mesaros, Romain Serizel

The Detection and Classification of Acoustic Scenes and Events Challenge Task 4 aims to advance sound event detection (SED) systems in domestic environments by leveraging training data with different supervision uncertainty.

Event Detection Missing Labels +1

One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech Recognition

no code implementations2 Oct 2023 Samuele Cornell, Jee-weon Jung, Shinji Watanabe, Stefano Squartini

This paper presents a novel framework for joint speaker diarization (SD) and automatic speech recognition (ASR), named SLIDAR (sliding-window diarization-augmented recognition).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

A Time-Frequency Generative Adversarial based method for Audio Packet Loss Concealment

1 code implementation28 Jul 2023 Carlo Aironi, Samuele Cornell, Luca Serafini, Stefano Squartini

Packet loss is a major cause of voice quality degradation in VoIP transmissions with serious impact on intelligibility and user experience.

Image-to-Image Translation Packet Loss Concealment +1

Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling

no code implementations18 Apr 2023 Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe

We propose FSB-LSTM, a novel long short-term memory (LSTM) based architecture that integrates full- and sub-band (FSB) modeling, for single- and multi-channel speech enhancement in the short-time Fourier transform (STFT) domain.

Speech Enhancement

Multi-Channel Target Speaker Extraction with Refinement: The WavLab Submission to the Second Clarity Enhancement Challenge

no code implementations15 Feb 2023 Samuele Cornell, Zhong-Qiu Wang, Yoshiki Masuyama, Shinji Watanabe, Manuel Pariente, Nobutaka Ono

To address the challenges encountered in the CEC2 setting, we introduce four major novelties: (1) we extend the state-of-the-art TF-GridNet model, originally designed for monaural speaker separation, for multi-channel, causal speech enhancement, and large improvements are observed by replacing the TCNDenseNet used in iNeuBe with this new architecture; (2) we leverage a recent dual window size approach with future-frame prediction to ensure that iNueBe-X satisfies the 5 ms constraint on algorithmic latency required by CEC2; (3) we introduce a novel speaker-conditioning branch for TF-GridNet to achieve target speaker extraction; (4) we propose a fine-tuning step, where we compute an additional loss with respect to the target speaker signal compensated with the listener audiogram.

Speaker Separation Speech Enhancement +1

Description and analysis of novelties introduced in DCASE Task 4 2022 on the baseline system

no code implementations14 Oct 2022 Francesca Ronchini, Samuele Cornell, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Daniel P. W. Ellis

The aim of the Detection and Classification of Acoustic Scenes and Events Challenge Task 4 is to evaluate systems for the detection of sound events in domestic environments using an heterogeneous dataset.

Event Segmentation

ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding

1 code implementation19 Jul 2022 Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe

To showcase such integration, we performed experiments on carefully designed synthetic datasets for noisy-reverberant multi-channel ST and SLU tasks, which can be used as benchmark corpora for future research.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Conversational Speech Separation: an Evaluation Study for Streaming Applications

no code implementations31 May 2022 Giovanni Morrone, Samuele Cornell, Enrico Zovato, Alessio Brutti, Stefano Squartini

Continuous speech separation (CSS) is a recently proposed framework which aims at separating each speaker from an input mixture signal in a streaming fashion.

Speech Separation

Exploring Self-Attention Mechanisms for Speech Separation

1 code implementation6 Feb 2022 Cem Subakan, Mirco Ravanelli, Samuele Cornell, Francois Grondin, Mirko Bronzi

In particular, we extend our previous findings on the SepFormer by providing results on more challenging noisy and noisy-reverberant datasets, such as LibriMix, WHAM!, and WHAMR!.

Denoising Speech Enhancement +1

Learning Filterbanks for End-to-End Acoustic Beamforming

no code implementations8 Nov 2021 Samuele Cornell, Manuel Pariente, François Grondin, Stefano Squartini

We perform a detailed analysis using the recent Clarity Challenge data and show that by using learnt filterbanks it is possible to surpass oracle-mask based beamforming for short windows.

The impact of non-target events in synthetic soundscapes for sound event detection

1 code implementation28 Sep 2021 Francesca Ronchini, Romain Serizel, Nicolas Turpault, Samuele Cornell

Detection and Classification Acoustic Scene and Events Challenge 2021 Task 4 uses a heterogeneous dataset that includes both recorded and synthetic soundscapes.

Event Detection Sound Event Detection

Graph-based Representation of Audio signals for Sound Event Classification

no code implementations 2021 29th European Signal Processing Conference (EUSIPCO) 2021 Carlo Aironi, Samuele Cornell, Emanuele Principi, Stefano Squartini

In recent years there has been a considerable rise in interest towards Graph Representation and Learning techniques, especially in such cases where data has intrinsically a graph- like structure: social networks, molecular lattices, or semantic interactions, just to name a few.

Learning to Rank Microphones for Distant Speech Recognition

1 code implementation6 Apr 2021 Samuele Cornell, Alessio Brutti, Marco Matassoni, Stefano Squartini

Fully exploiting ad-hoc microphone networks for distant speech recognition is still an open issue.

channel selection Decoder +3

Attention is All You Need in Speech Separation

4 code implementations25 Oct 2020 Cem Subakan, Mirco Ravanelli, Samuele Cornell, Mirko Bronzi, Jianyuan Zhong

Transformers are emerging as a natural alternative to standard RNNs, replacing recurrent computations with a multi-head attention mechanism.

Speech Separation

LibriMix: An Open-Source Dataset for Generalizable Speech Separation

5 code implementations22 May 2020 Joris Cosentino, Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent

Most deep learning-based speech separation models today are benchmarked on it.

Audio and Speech Processing

Filterbank design for end-to-end speech separation

2 code implementations23 Oct 2019 Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent

Also, we validate the use of parameterized filterbanks and show that complex-valued representations and masks are beneficial in all conditions.

Speaker Recognition Speech Separation

Cannot find the paper you are looking for? You can Submit a new open access paper.