Search Results for author: Romain Serizel

Found 31 papers, 16 papers with code

Mixture of Mixups for Multi-label Classification of Rare Anuran Sounds

1 code implementation • 14 Mar 2024 • Ilyass Moummad, Nicolas Farrugia, Romain Serizel, Jeremy Froidevaux, Vincent Lostanlen

Multi-label imbalanced classification poses a significant challenge in machine learning, particularly evident in bioacoustics where animal sounds often co-occur, and certain sounds are much less frequent than others.

imbalanced classification Multi-Label Classification

Paper
Code

Self-Supervised Learning for Few-Shot Bird Sound Classification

1 code implementation • 25 Dec 2023 • Ilyass Moummad, Romain Serizel, Nicolas Farrugia

Self-supervised learning (SSL) in audio holds significant potential across various domains, particularly in situations where abundant, unlabeled data is readily available at no cost.

Classification Few-Shot Learning +2

Paper
Code

Performance and energy balance: a comprehensive study of state-of-the-art sound event detection systems

2 code implementations • 5 Oct 2023 • Francesca Ronchini, Romain Serizel

In recent years, deep learning systems have shown a concerning trend toward increased complexity and higher energy consumption.

Event Detection Sound Event Detection

Paper
Code

Diffusion-based speech enhancement with a weighted generative-supervised learning loss

no code implementations • 19 Sep 2023 • Jean-Eudes Ayilo, Mostafa Sadeghi, Romain Serizel

Diffusion-based generative models have recently gained attention in speech enhancement (SE), providing an alternative to conventional supervised methods.

Speech Enhancement

Paper
Add Code

Posterior sampling algorithms for unsupervised speech enhancement with recurrent variational autoencoder

no code implementations • 19 Sep 2023 • Mostafa Sadeghi, Romain Serizel

Nevertheless, the involved iterative variational expectation-maximization (VEM) process at test time, which relies on a variational inference method, results in high computational complexity.

Computational Efficiency Speech Enhancement +1

Paper
Add Code

Unsupervised speech enhancement with diffusion-based generative models

2 code implementations • 19 Sep 2023 • Berné Nortier, Mostafa Sadeghi, Romain Serizel

To address this issue, we introduce an alternative approach that operates in an unsupervised manner, leveraging the generative power of diffusion models.

Speech Enhancement

379

Paper
Code

Regularized Contrastive Pre-training for Few-shot Bioacoustic Sound Detection

1 code implementation • 16 Sep 2023 • Ilyass Moummad, Romain Serizel, Nicolas Farrugia

Bioacoustic sound event detection allows for better understanding of animal behavior and for better monitoring biodiversity using audio.

Event Detection Few-Shot Learning +1

Paper
Code

Pretraining Representations for Bioacoustic Few-shot Detection using Supervised Contrastive Learning

1 code implementation • 2 Sep 2023 • Ilyass Moummad, Romain Serizel, Nicolas Farrugia

The bioacoustic community recasted the problem of sound event detection within the framework of few-shot learning, i. e. training a system with only few labeled examples.

Contrastive Learning Data Augmentation +3

Paper
Code

SAMbA: Speech enhancement with Asynchronous ad-hoc Microphone Arrays

no code implementations • 31 Jul 2023 • Nicolas Furnon, Romain Serizel, Slim Essid, Irina Illina

Speech enhancement in ad-hoc microphone arrays is often hindered by the asynchronization of the devices composing the microphone array.

Speech Enhancement

Paper
Add Code

Post-Processing Independent Evaluation of Sound Event Detection Systems

1 code implementation • 27 Jun 2023 • Janek Ebbers, Reinhold Haeb-Umbach, Romain Serizel

It summarizes the system performance over a range of operating modes resulting from varying the decision threshold that is used to translate the system output scores into a binary detection output.

Event Detection Sound Event Detection

Paper
Code

Fast and efficient speech enhancement with variational autoencoders

no code implementations • 2 Nov 2022 • Mostafa Sadeghi, Romain Serizel

Unsupervised speech enhancement based on variational autoencoders has shown promising performance compared with the commonly used supervised methods.

Computational Efficiency Speech Enhancement +1

Paper
Add Code

A weighted-variance variational autoencoder model for speech enhancement

no code implementations • 2 Nov 2022 • Ali Golmakani, Mostafa Sadeghi, Xavier Alameda-Pineda, Romain Serizel

A zero-mean complex-valued Gaussian distribution is usually assumed for the generative model, where the speech information is encoded in the variance as a function of a latent variable.

Speech Enhancement

Paper
Add Code

Audio-visual speech enhancement with a deep Kalman filter generative model

no code implementations • 2 Nov 2022 • Ali Golmakani, Mostafa Sadeghi, Romain Serizel

Deep latent variable generative models based on variational autoencoder (VAE) have shown promising performance for audiovisual speech enhancement (AVSE).

Speech Enhancement

Paper
Add Code

Description and analysis of novelties introduced in DCASE Task 4 2022 on the baseline system

no code implementations • 14 Oct 2022 • Francesca Ronchini, Samuele Cornell, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Daniel P. W. Ellis

The aim of the Detection and Classification of Acoustic Scenes and Events Challenge Task 4 is to evaluate systems for the detection of sound events in domestic environments using an heterogeneous dataset.

Event Segmentation

Paper
Add Code

A benchmark of state-of-the-art sound event detection systems evaluated on synthetic soundscapes

no code implementations • 3 Feb 2022 • Francesca Ronchini, Romain Serizel

A last experiment is proposed in order to study the impact of non-target events on systems outputs.

Data Augmentation Event Detection +3

Paper
Add Code

Threshold Independent Evaluation of Sound Event Detection Scores

1 code implementation • 31 Jan 2022 • Janek Ebbers, Romain Serizel, Reinhold Haeb-Umbach

Performing an adequate evaluation of sound event detection (SED) systems is far from trivial and is still subject to ongoing research.

Event Detection Sound Event Detection

Paper
Code

AUTOMATED AUDIO CAPTIONING BY FINE-TUNING BART WITH AUDIOSET TAGS

1 code implementation • DCASE workshop 2021 • F ́elix Gontier, Romain Serizel, Christophe Cerisara

utomated audio captioning is the multimodal task of describing environmental audio recordings with fluent natural language.

Ranked #7 on Audio captioning on AudioCaps

AudioCaps Audio captioning +2

Paper
Code

The impact of non-target events in synthetic soundscapes for sound event detection

1 code implementation • 28 Sep 2021 • Francesca Ronchini, Romain Serizel, Nicolas Turpault, Samuele Cornell

Detection and Classification Acoustic Scene and Events Challenge 2021 Task 4 uses a heterogeneous dataset that includes both recorded and synthetic soundscapes.

Event Detection Sound Event Detection

109

Paper
Code

Attention-based distributed speech enhancement for unconstrained microphone arrays with varying number of nodes

1 code implementation • 15 Jun 2021 • Nicolas Furnon, Romain Serizel, Slim Essid, Irina Illina

Speech enhancement promises higher efficiency in ad-hoc microphone arrays than in constrained microphone arrays thanks to the wide spatial coverage of the devices in the acoustic scene.

Speech Enhancement

Paper
Code

DNN-based mask estimation for distributed speech enhancement in spatially unconstrained microphone arrays

1 code implementation • 3 Nov 2020 • Nicolas Furnon, Romain Serizel, Irina Illina, Slim Essid

Deep neural network (DNN)-based speech enhancement algorithms in microphone arrays have now proven to be efficient solutions to speech understanding and speech recognition in noisy environments.

Noise Estimation Speech Enhancement +2

Paper
Code

What's All the FUSS About Free Universal Sound Separation Data?

no code implementations • 2 Nov 2020 • Scott Wisdom, Hakan Erdogan, Daniel Ellis, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Justin Salamon, Prem Seetharaman, John Hershey

We introduce the Free Universal Sound Separation (FUSS) dataset, a new corpus for experiments in separating mixtures of an unknown number of sounds from an open domain of sound types.

Data Augmentation

Paper
Add Code

Distributed speech separation in spatially unconstrained microphone arrays

1 code implementation • 2 Nov 2020 • Nicolas Furnon, Romain Serizel, Irina Illina, Slim Essid

We propose a distributed algorithm that can process spatial information in a spatially unconstrained microphone array.

Speech Separation

Paper
Code

Improving Sound Event Detection Metrics: Insights from DCASE 2020

no code implementations • 26 Oct 2020 • Giacomo Ferroni, Nicolas Turpault, Juan Azcarreta, Francesco Tuveri, Romain Serizel, Çagdaş Bilen, Sacha Krstulović

The ranking of sound event detection (SED) systems may be biased by assumptions inherent to evaluation criteria and to the choice of an operating point.

Event Detection Sound Event Detection +1

Paper
Add Code

UIAI System for Short-Duration Speaker Verification Challenge 2020

no code implementations • 26 Jul 2020 • Md Sahidullah, Achintya Kumar Sarkar, Ville Vestman, Xuechen Liu, Romain Serizel, Tomi Kinnunen, Zheng-Hua Tan, Emmanuel Vincent

Our primary submission to the challenge is the fusion of seven subsystems which yields a normalized minimum detection cost function (minDCF) of 0. 072 and an equal error rate (EER) of 2. 14% on the evaluation set.

Text-Dependent Speaker Verification

Paper
Add Code

Foreground-Background Ambient Sound Scene Separation

no code implementations • 11 May 2020 • Michel Olvera, Emmanuel Vincent, Romain Serizel, Gilles Gasso

Ambient sound scenes typically comprise multiple short events occurring on top of a somewhat stationary background.

Paper
Add Code

DNN-Based Distributed Multichannel Mask Estimation for Speech Enhancement in Microphone Arrays

no code implementations • 13 Feb 2020 • Nicolas Furnon, Romain Serizel, Irina Illina, Slim Essid

Multichannel processing is widely used for speech enhancement but several limitations appear when trying to deploy these solutions to the real-world.

Speech Enhancement

Paper
Add Code

Limitations of weak labels for embedding and tagging

1 code implementation • 5 Feb 2020 • Nicolas Turpault, Romain Serizel, Emmanuel Vincent

Many datasets and approaches in ambient sound analysis use weakly labeled data. Weak labels are employed because annotating every data sample with a strong label is too expensive. Yet, their impact on the performance in comparison to strong labels remains unclear. Indeed, weak labels must often be dealt with at the same time as other challenges, namely multiple labels per sample, unbalanced classes and/or overlapping events. In this paper, we formulate a supervised learning problem which involves weak labels. We create a dataset that focuses on the difference between strong and weak labels as opposed to other challenges.

Paper
Code

Joint NN-Supported Multichannel Reduction of Acoustic Echo, Reverberation and Noise

no code implementations • 20 Nov 2019 • Guillaume Carbajal, Romain Serizel, Emmanuel Vincent, Eric Humbert

We consider the problem of simultaneous reduction of acoustic echo, reverberation and noise.

Paper
Add Code

The Speed Submission to DIHARD II: Contributions & Lessons Learned

no code implementations • 6 Nov 2019 • Md Sahidullah, Jose Patino, Samuele Cornell, Ruiqing Yin, Sunit Sivasankaran, Hervé Bredin, Pavel Korshunov, Alessio Brutti, Romain Serizel, Emmanuel Vincent, Nicholas Evans, Sébastien Marcel, Stefano Squartini, Claude Barras

This paper describes the speaker diarization systems developed for the Second DIHARD Speech Diarization Challenge (DIHARD II) by the Speed team.

Action Detection Activity Detection +4

Paper
Add Code

Sound event detection in domestic environments withweakly labeled data and soundscape synthesis

1 code implementation • 26 Oct 2019 • Nicolas Turpault, Romain Serizel, Ankit Shah, Justin Salamon

This paper presents Task 4 of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 challenge and provides a first analysis of the challenge results.

Ranked #8 on Sound Event Detection on DESED

Event Detection Sound Event Detection

Paper
Code

Rank-1 Constrained Multichannel Wiener Filter for Speech Recognition in Noisy Environments

1 code implementation • 1 Jul 2017 • Ziteng Wang, Emmanuel Vincent, Romain Serizel, Yonghong Yan

Multichannel linear filters, such as the Multichannel Wiener Filter (MWF) and the Generalized Eigenvalue (GEV) beamformer are popular signal processing techniques which can improve speech recognition performance.

speech-recognition Speech Recognition

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.