no code implementations • 13 Mar 2025 • Jacob Comeau, Mathieu Bazinet, Pascal Germain, Cem Subakan
Continual learning algorithms aim to learn from a sequence of tasks, making the training distribution non-stationary.
no code implementations • 11 Feb 2025 • Shubham Gupta, Zichao Li, Tianyi Chen, Cem Subakan, Siva Reddy, Perouz Taslakian, Valentina Zantedeschi
In this paper, we propose a tree-based method for organizing and representing reference documents at various granular levels, which offers the flexibility to balance cost and utility, and eases the inspection of the corpus content and retrieval operations.
no code implementations • 6 Feb 2025 • Luca Della Libera, Francesco Paissan, Cem Subakan, Mirco Ravanelli
Large language models have revolutionized natural language processing through self-supervised pretraining on massive datasets.
1 code implementation • 8 Jan 2025 • Anthony Deschênes, Rémi Georges, Cem Subakan, Bruna Ugulino, Antoine Henry, Michael Morin
Specifically, our convolutional autoencoder with skip connections (Skip-CAE) and our Skip-CAE transformer outperform the DCASE autoencoder baseline, one-class SVM, isolation forest and a published convolutional autoencoder architecture, respectively obtaining an area under the ROC curve of 0. 846 and 0. 875 on a dataset of real-factory planer sounds.
no code implementations • 12 Nov 2024 • Eleonora Mancini, Francesco Paissan, Paolo Torroni, Mirco Ravanelli, Cem Subakan
Speech impairments in Parkinson's disease (PD) provide significant early indicators for diagnosis.
no code implementations • 8 Oct 2024 • Fırat Öncel, Matthias Bethge, Beyza Ermis, Mirco Ravanelli, Cem Subakan, Çağatay Yıldız
Our further token-level perplexity observations reveals that the perplexity degradation is due to a handful of tokens that are not informative about the domain.
1 code implementation • 7 Oct 2024 • Shubham Gupta, Isaac Neri Gomez-Sarmiento, Faez Amjed Mezdari, Mirco Ravanelli, Cem Subakan
We propose a novel approach for humming transcription that combines a CNN-based architecture with a dynamic programming-based post-processing algorithm, utilizing the recently introduced HumTrans dataset.
no code implementations • 13 Sep 2024 • Eleonora Mancini, Francesco Paissan, Mirco Ravanelli, Cem Subakan
Neural networks are typically black-boxes that remain opaque with regards to their decision mechanisms.
no code implementations • 29 Jun 2024 • Mirco Ravanelli, Titouan Parcollet, Adel Moumen, Sylvain de Langen, Cem Subakan, Peter Plantinga, Yingzhi Wang, Pooneh Mousavi, Luca Della Libera, Artem Ploujnikov, Francesco Paissan, Davide Borra, Salah Zaiem, Zeyu Zhao, Shucong Zhang, Georgios Karakasidis, Sung-Lin Yeh, Pierre Champion, Aku Rouhe, Rudolf Braun, Florian Mai, Juan Zuluaga-Gomez, Seyed Mahed Mousavi, Andreas Nautsch, Xuechen Liu, Sangeet Sagar, Jarod Duret, Salima Mdhaffar, Gaelle Laperriere, Mickael Rouvier, Renato de Mori, Yannick Esteve
This paper presents SpeechBrain 1. 0, a significant milestone in the evolution of the toolkit, which now has over 200 recipes for speech, audio, and language processing tasks, and more than 100 models available on Hugging Face.
no code implementations • 20 Jun 2024 • Pooneh Mousavi, Luca Della Libera, Jarod Duret, Artem Ploujnikov, Cem Subakan, Mirco Ravanelli
Discrete audio tokens have recently gained considerable attention for their potential to connect audio and language processing, enabling the creation of modern multimodal large language models.
no code implementations • 15 Jun 2024 • Pooneh Mousavi, Jarod Duret, Salah Zaiem, Luca Della Libera, Artem Ploujnikov, Cem Subakan, Mirco Ravanelli
Discrete audio tokens have recently gained attention for their potential to bridge the gap between audio and language processing.
no code implementations • 14 Jun 2024 • Shubham Gupta, Mirco Ravanelli, Pascal Germain, Cem Subakan
In this paper, we propose Phoneme Discretized Saliency Maps (PDSM), a discretization algorithm for saliency maps that takes advantage of phoneme boundaries for explainable detection of AI-generated voice.
no code implementations • 27 May 2024 • Francesco Paissan, Luca Della Libera, Mirco Ravanelli, Cem Subakan
Interpreting the decisions of deep learning models, including audio classifiers, is crucial for ensuring the transparency and trustworthiness of this technology.
no code implementations • 19 Mar 2024 • Francesco Paissan, Mirco Ravanelli, Cem Subakan
Despite the impressive performance of deep learning models across diverse tasks, their complexity poses challenges for interpretation.
no code implementations • 5 Feb 2024 • Luca Della Libera, Cem Subakan, Mirco Ravanelli
The increasing success of deep neural networks has raised concerns about their inherent black-box nature, posing challenges related to interpretability and trust.
1 code implementation • 25 Oct 2023 • Luca Della Libera, Pooneh Mousavi, Salah Zaiem, Cem Subakan, Mirco Ravanelli
To the best of our knowledge, CL-MASR is the first continual learning benchmark for the multilingual ASR task.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 19 Oct 2023 • Francesco Paissan, Luca Della Libera, Zhepei Wang, Mirco Ravanelli, Paris Smaragdis, Cem Subakan
We show that the proposed editing pipeline is able to create audio edits that remain faithful to the input audio.
1 code implementation • 29 May 2023 • Juan Zuluaga-Gomez, Sara Ahmed, Danielius Visockas, Cem Subakan
We introduce a simple-to-follow recipe aligned to the SpeechBrain toolkit for accent classification based on Common Voice 7. 0 (English) and Common Voice 11. 0 (Italian, German, and Spanish).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 3 May 2023 • Zhepei Wang, Cem Subakan, Krishna Subramani, Junkai Wu, Tiago Tavares, Fabio Ayres, Paris Smaragdis
In this paper, we study unsupervised approaches to improve the learning framework of such representations with unpaired text and audio.
no code implementations • 2 May 2023 • Arsenii Gorin, Cem Subakan, Sajjad Abdoli, Junhao Wang, Samantha Latremouille, Charles Onu
In this paper, we explore self-supervised learning (SSL) for analyzing a first-of-its-kind database of cry recordings containing clinical indications of more than a thousand newborns.
2 code implementations • 1 May 2023 • David Budaghyan, Charles C. Onu, Arsenii Gorin, Cem Subakan, Doina Precup
This paper describes the Ubenwa CryCeleb dataset - a labeled collection of infant cries - and the accompanying CryCeleb 2023 task, which is a public speaker verification challenge based on cry sounds.
1 code implementation • 22 Mar 2023 • Francesco Paissan, Cem Subakan, Mirco Ravanelli
In this paper, we introduce a new approach, called Posthoc Interpretation via Quantization (PIQ), for interpreting decisions made by trained classifiers.
1 code implementation • 19 Jun 2022 • Luca Della Libera, Cem Subakan, Mirco Ravanelli, Samuele Cornell, Frédéric Lepoutre, François Grondin
Transformers have recently achieved state-of-the-art performance in speech separation.
1 code implementation • 15 May 2022 • Zhepei Wang, Cem Subakan, Xilin Jiang, Junkai Wu, Efthymios Tzinis, Mirco Ravanelli, Paris Smaragdis
In this paper, we work on a sound recognition system that continually incorporates new sound classes.
1 code implementation • 6 Feb 2022 • Cem Subakan, Mirco Ravanelli, Samuele Cornell, Francois Grondin, Mirko Bronzi
In particular, we extend our previous findings on the SepFormer by providing results on more challenging noisy and noisy-reverberant datasets, such as LibriMix, WHAM!, and WHAMR!.
Ranked #1 on
Speech Enhancement
on WHAM!
1 code implementation • 20 Oct 2021 • Cem Subakan, Mirco Ravanelli, Samuele Cornell, François Grondin
First, we release the REAL-M dataset, a crowd-sourced corpus of real-life mixtures.
4 code implementations • 8 Jun 2021 • Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, Ju-chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng Liao, Elena Rastorgueva, François Grondin, William Aris, Hwidong Na, Yan Gao, Renato de Mori, Yoshua Bengio
SpeechBrain is an open-source and all-in-one speech toolkit.
4 code implementations • 25 Oct 2020 • Cem Subakan, Mirco Ravanelli, Samuele Cornell, Mirko Bronzi, Jianyuan Zhong
Transformers are emerging as a natural alternative to standard RNNs, replacing recurrent computations with a multi-head attention mechanism.
Ranked #7 on
Speech Separation
on WSJ0-3mix
2 code implementations • 22 Oct 2019 • Efthymios Tzinis, Shrikant Venkataramani, Zhepei Wang, Cem Subakan, Paris Smaragdis
In the first step we learn a transform (and it's inverse) to a latent space where masking-based separation performance using oracles is optimal.
Ranked #32 on
Speech Separation
on WSJ0-2mix
no code implementations • 3 Jun 2019 • Zhepei Wang, Cem Subakan, Efthymios Tzinis, Paris Smaragdis, Laurent Charlin
We show that by incrementally refining a classifier with generative replay a generator that is 4% of the size of all previous training data matches the performance of refining the classifier keeping 20% of all previous training data.
no code implementations • 12 Mar 2018 • Cem Subakan, Oluwasanmi Koyejo, Paris Smaragdis
Popular generative model learning methods such as Generative Adversarial Networks (GANs), and Variational Autoencoders (VAE) enforce the latent representation to follow simple distributions such as isotropic Gaussian.
1 code implementation • 30 Oct 2017 • Cem Subakan, Paris Smaragdis
Generative source separation methods such as non-negative matrix factorization (NMF) or auto-encoders, rely on the assumption of an output probability density.
no code implementations • NeurIPS 2014 • Cem Subakan, Johannes Traa, Paris Smaragdis
In this paper, we propose a learning approach for the Mixture of Hidden Markov Models (MHMM) based on the Method of Moments (MoM).