1 code implementation • 14 Mar 2024 • Ilyass Moummad, Nicolas Farrugia, Romain Serizel, Jeremy Froidevaux, Vincent Lostanlen
Multi-label imbalanced classification poses a significant challenge in machine learning, particularly evident in bioacoustics where animal sounds often co-occur, and certain sounds are much less frequent than others.
1 code implementation • 25 Dec 2023 • Ilyass Moummad, Romain Serizel, Nicolas Farrugia
Self-supervised learning (SSL) in audio holds significant potential across various domains, particularly in situations where abundant, unlabeled data is readily available at no cost.
2 code implementations • 5 Oct 2023 • Francesca Ronchini, Romain Serizel
In recent years, deep learning systems have shown a concerning trend toward increased complexity and higher energy consumption.
no code implementations • 19 Sep 2023 • Jean-Eudes Ayilo, Mostafa Sadeghi, Romain Serizel
Diffusion-based generative models have recently gained attention in speech enhancement (SE), providing an alternative to conventional supervised methods.
no code implementations • 19 Sep 2023 • Mostafa Sadeghi, Romain Serizel
Nevertheless, the involved iterative variational expectation-maximization (VEM) process at test time, which relies on a variational inference method, results in high computational complexity.
2 code implementations • 19 Sep 2023 • Berné Nortier, Mostafa Sadeghi, Romain Serizel
To address this issue, we introduce an alternative approach that operates in an unsupervised manner, leveraging the generative power of diffusion models.
1 code implementation • 16 Sep 2023 • Ilyass Moummad, Romain Serizel, Nicolas Farrugia
Bioacoustic sound event detection allows for better understanding of animal behavior and for better monitoring biodiversity using audio.
1 code implementation • 2 Sep 2023 • Ilyass Moummad, Romain Serizel, Nicolas Farrugia
The bioacoustic community recasted the problem of sound event detection within the framework of few-shot learning, i. e. training a system with only few labeled examples.
no code implementations • 31 Jul 2023 • Nicolas Furnon, Romain Serizel, Slim Essid, Irina Illina
Speech enhancement in ad-hoc microphone arrays is often hindered by the asynchronization of the devices composing the microphone array.
1 code implementation • 27 Jun 2023 • Janek Ebbers, Reinhold Haeb-Umbach, Romain Serizel
It summarizes the system performance over a range of operating modes resulting from varying the decision threshold that is used to translate the system output scores into a binary detection output.
no code implementations • 2 Nov 2022 • Mostafa Sadeghi, Romain Serizel
Unsupervised speech enhancement based on variational autoencoders has shown promising performance compared with the commonly used supervised methods.
no code implementations • 2 Nov 2022 • Ali Golmakani, Mostafa Sadeghi, Xavier Alameda-Pineda, Romain Serizel
A zero-mean complex-valued Gaussian distribution is usually assumed for the generative model, where the speech information is encoded in the variance as a function of a latent variable.
no code implementations • 2 Nov 2022 • Ali Golmakani, Mostafa Sadeghi, Romain Serizel
Deep latent variable generative models based on variational autoencoder (VAE) have shown promising performance for audiovisual speech enhancement (AVSE).
no code implementations • 14 Oct 2022 • Francesca Ronchini, Samuele Cornell, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Daniel P. W. Ellis
The aim of the Detection and Classification of Acoustic Scenes and Events Challenge Task 4 is to evaluate systems for the detection of sound events in domestic environments using an heterogeneous dataset.
no code implementations • 3 Feb 2022 • Francesca Ronchini, Romain Serizel
A last experiment is proposed in order to study the impact of non-target events on systems outputs.
1 code implementation • 31 Jan 2022 • Janek Ebbers, Romain Serizel, Reinhold Haeb-Umbach
Performing an adequate evaluation of sound event detection (SED) systems is far from trivial and is still subject to ongoing research.
1 code implementation • DCASE workshop 2021 • F ́elix Gontier, Romain Serizel, Christophe Cerisara
utomated audio captioning is the multimodal task of describing environmental audio recordings with fluent natural language.
Ranked #7 on Audio captioning on AudioCaps
1 code implementation • 28 Sep 2021 • Francesca Ronchini, Romain Serizel, Nicolas Turpault, Samuele Cornell
Detection and Classification Acoustic Scene and Events Challenge 2021 Task 4 uses a heterogeneous dataset that includes both recorded and synthetic soundscapes.
1 code implementation • 15 Jun 2021 • Nicolas Furnon, Romain Serizel, Slim Essid, Irina Illina
Speech enhancement promises higher efficiency in ad-hoc microphone arrays than in constrained microphone arrays thanks to the wide spatial coverage of the devices in the acoustic scene.
1 code implementation • 3 Nov 2020 • Nicolas Furnon, Romain Serizel, Irina Illina, Slim Essid
Deep neural network (DNN)-based speech enhancement algorithms in microphone arrays have now proven to be efficient solutions to speech understanding and speech recognition in noisy environments.
no code implementations • 2 Nov 2020 • Scott Wisdom, Hakan Erdogan, Daniel Ellis, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Justin Salamon, Prem Seetharaman, John Hershey
We introduce the Free Universal Sound Separation (FUSS) dataset, a new corpus for experiments in separating mixtures of an unknown number of sounds from an open domain of sound types.
1 code implementation • 2 Nov 2020 • Nicolas Furnon, Romain Serizel, Irina Illina, Slim Essid
We propose a distributed algorithm that can process spatial information in a spatially unconstrained microphone array.
no code implementations • 26 Oct 2020 • Giacomo Ferroni, Nicolas Turpault, Juan Azcarreta, Francesco Tuveri, Romain Serizel, Çagdaş Bilen, Sacha Krstulović
The ranking of sound event detection (SED) systems may be biased by assumptions inherent to evaluation criteria and to the choice of an operating point.
no code implementations • 26 Jul 2020 • Md Sahidullah, Achintya Kumar Sarkar, Ville Vestman, Xuechen Liu, Romain Serizel, Tomi Kinnunen, Zheng-Hua Tan, Emmanuel Vincent
Our primary submission to the challenge is the fusion of seven subsystems which yields a normalized minimum detection cost function (minDCF) of 0. 072 and an equal error rate (EER) of 2. 14% on the evaluation set.
no code implementations • 11 May 2020 • Michel Olvera, Emmanuel Vincent, Romain Serizel, Gilles Gasso
Ambient sound scenes typically comprise multiple short events occurring on top of a somewhat stationary background.
no code implementations • 13 Feb 2020 • Nicolas Furnon, Romain Serizel, Irina Illina, Slim Essid
Multichannel processing is widely used for speech enhancement but several limitations appear when trying to deploy these solutions to the real-world.
1 code implementation • 5 Feb 2020 • Nicolas Turpault, Romain Serizel, Emmanuel Vincent
Many datasets and approaches in ambient sound analysis use weakly labeled data. Weak labels are employed because annotating every data sample with a strong label is too expensive. Yet, their impact on the performance in comparison to strong labels remains unclear. Indeed, weak labels must often be dealt with at the same time as other challenges, namely multiple labels per sample, unbalanced classes and/or overlapping events. In this paper, we formulate a supervised learning problem which involves weak labels. We create a dataset that focuses on the difference between strong and weak labels as opposed to other challenges.
no code implementations • 20 Nov 2019 • Guillaume Carbajal, Romain Serizel, Emmanuel Vincent, Eric Humbert
We consider the problem of simultaneous reduction of acoustic echo, reverberation and noise.
no code implementations • 6 Nov 2019 • Md Sahidullah, Jose Patino, Samuele Cornell, Ruiqing Yin, Sunit Sivasankaran, Hervé Bredin, Pavel Korshunov, Alessio Brutti, Romain Serizel, Emmanuel Vincent, Nicholas Evans, Sébastien Marcel, Stefano Squartini, Claude Barras
This paper describes the speaker diarization systems developed for the Second DIHARD Speech Diarization Challenge (DIHARD II) by the Speed team.
1 code implementation • 26 Oct 2019 • Nicolas Turpault, Romain Serizel, Ankit Shah, Justin Salamon
This paper presents Task 4 of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 challenge and provides a first analysis of the challenge results.
Ranked #8 on Sound Event Detection on DESED
1 code implementation • 1 Jul 2017 • Ziteng Wang, Emmanuel Vincent, Romain Serizel, Yonghong Yan
Multichannel linear filters, such as the Multichannel Wiener Filter (MWF) and the Generalized Eigenvalue (GEV) beamformer are popular signal processing techniques which can improve speech recognition performance.