To the best of our knowledge, CL-MASR is the first continual learning benchmark for the multilingual ASR task.
We show that the proposed editing pipeline is able to create audio edits that remain faithful to the input audio.
We introduce a simple-to-follow recipe aligned to the SpeechBrain toolkit for accent classification based on Common Voice 7. 0 (English) and Common Voice 11. 0 (Italian, German, and Spanish).
In this paper, we study unsupervised approaches to improve the learning framework of such representations with unpaired text and audio.
In this paper, we explore self-supervised learning (SSL) for analyzing a first-of-its-kind database of cry recordings containing clinical indications of more than a thousand newborns.
This paper describes the Ubenwa CryCeleb dataset - a labeled collection of infant cries - and the accompanying CryCeleb 2023 task, which is a public speaker verification challenge based on cry sounds.
In this paper, we introduce a new approach, called Posthoc Interpretation via Quantization (PIQ), for interpreting decisions made by trained classifiers.
In this paper, we work on a sound recognition system that continually incorporates new sound classes.
In particular, we extend our previous findings on the SepFormer by providing results on more challenging noisy and noisy-reverberant datasets, such as LibriMix, WHAM!, and WHAMR!.
Ranked #1 on Speech Enhancement on WHAM!
First, we release the REAL-M dataset, a crowd-sourced corpus of real-life mixtures.
4 code implementations • 8 Jun 2021 • Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, Ju-chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng Liao, Elena Rastorgueva, François Grondin, William Aris, Hwidong Na, Yan Gao, Renato de Mori, Yoshua Bengio
SpeechBrain is an open-source and all-in-one speech toolkit.
Transformers are emerging as a natural alternative to standard RNNs, replacing recurrent computations with a multi-head attention mechanism.
Ranked #5 on Speech Separation on WSJ0-2mix
In the first step we learn a transform (and it's inverse) to a latent space where masking-based separation performance using oracles is optimal.
Ranked #21 on Speech Separation on WSJ0-2mix
We show that by incrementally refining a classifier with generative replay a generator that is 4% of the size of all previous training data matches the performance of refining the classifier keeping 20% of all previous training data.
Popular generative model learning methods such as Generative Adversarial Networks (GANs), and Variational Autoencoders (VAE) enforce the latent representation to follow simple distributions such as isotropic Gaussian.