Search Results for author: Romain Serizel

Found 31 papers, 16 papers with code

Mixture of Mixups for Multi-label Classification of Rare Anuran Sounds

1 code implementation14 Mar 2024 Ilyass Moummad, Nicolas Farrugia, Romain Serizel, Jeremy Froidevaux, Vincent Lostanlen

Multi-label imbalanced classification poses a significant challenge in machine learning, particularly evident in bioacoustics where animal sounds often co-occur, and certain sounds are much less frequent than others.

imbalanced classification Multi-Label Classification

Self-Supervised Learning for Few-Shot Bird Sound Classification

1 code implementation25 Dec 2023 Ilyass Moummad, Romain Serizel, Nicolas Farrugia

Self-supervised learning (SSL) in audio holds significant potential across various domains, particularly in situations where abundant, unlabeled data is readily available at no cost.

Classification Few-Shot Learning +2

Performance and energy balance: a comprehensive study of state-of-the-art sound event detection systems

2 code implementations5 Oct 2023 Francesca Ronchini, Romain Serizel

In recent years, deep learning systems have shown a concerning trend toward increased complexity and higher energy consumption.

Event Detection Sound Event Detection

Posterior sampling algorithms for unsupervised speech enhancement with recurrent variational autoencoder

no code implementations19 Sep 2023 Mostafa Sadeghi, Romain Serizel

Nevertheless, the involved iterative variational expectation-maximization (VEM) process at test time, which relies on a variational inference method, results in high computational complexity.

Computational Efficiency Speech Enhancement +1

Unsupervised speech enhancement with diffusion-based generative models

2 code implementations19 Sep 2023 Berné Nortier, Mostafa Sadeghi, Romain Serizel

To address this issue, we introduce an alternative approach that operates in an unsupervised manner, leveraging the generative power of diffusion models.

Speech Enhancement

Diffusion-based speech enhancement with a weighted generative-supervised learning loss

no code implementations19 Sep 2023 Jean-Eudes Ayilo, Mostafa Sadeghi, Romain Serizel

Diffusion-based generative models have recently gained attention in speech enhancement (SE), providing an alternative to conventional supervised methods.

Speech Enhancement

Regularized Contrastive Pre-training for Few-shot Bioacoustic Sound Detection

1 code implementation16 Sep 2023 Ilyass Moummad, Romain Serizel, Nicolas Farrugia

Bioacoustic sound event detection allows for better understanding of animal behavior and for better monitoring biodiversity using audio.

Event Detection Few-Shot Learning +1

Pretraining Representations for Bioacoustic Few-shot Detection using Supervised Contrastive Learning

1 code implementation2 Sep 2023 Ilyass Moummad, Romain Serizel, Nicolas Farrugia

The bioacoustic community recasted the problem of sound event detection within the framework of few-shot learning, i. e. training a system with only few labeled examples.

Contrastive Learning Data Augmentation +3

SAMbA: Speech enhancement with Asynchronous ad-hoc Microphone Arrays

no code implementations31 Jul 2023 Nicolas Furnon, Romain Serizel, Slim Essid, Irina Illina

Speech enhancement in ad-hoc microphone arrays is often hindered by the asynchronization of the devices composing the microphone array.

Speech Enhancement

Post-Processing Independent Evaluation of Sound Event Detection Systems

1 code implementation27 Jun 2023 Janek Ebbers, Reinhold Haeb-Umbach, Romain Serizel

It summarizes the system performance over a range of operating modes resulting from varying the decision threshold that is used to translate the system output scores into a binary detection output.

Event Detection Sound Event Detection

Audio-visual speech enhancement with a deep Kalman filter generative model

no code implementations2 Nov 2022 Ali Golmakani, Mostafa Sadeghi, Romain Serizel

Deep latent variable generative models based on variational autoencoder (VAE) have shown promising performance for audiovisual speech enhancement (AVSE).

Speech Enhancement

A weighted-variance variational autoencoder model for speech enhancement

no code implementations2 Nov 2022 Ali Golmakani, Mostafa Sadeghi, Xavier Alameda-Pineda, Romain Serizel

A zero-mean complex-valued Gaussian distribution is usually assumed for the generative model, where the speech information is encoded in the variance as a function of a latent variable.

Speech Enhancement

Fast and efficient speech enhancement with variational autoencoders

no code implementations2 Nov 2022 Mostafa Sadeghi, Romain Serizel

Unsupervised speech enhancement based on variational autoencoders has shown promising performance compared with the commonly used supervised methods.

Computational Efficiency Speech Enhancement +1

Description and analysis of novelties introduced in DCASE Task 4 2022 on the baseline system

no code implementations14 Oct 2022 Francesca Ronchini, Samuele Cornell, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Daniel P. W. Ellis

The aim of the Detection and Classification of Acoustic Scenes and Events Challenge Task 4 is to evaluate systems for the detection of sound events in domestic environments using an heterogeneous dataset.

Event Segmentation

Threshold Independent Evaluation of Sound Event Detection Scores

1 code implementation31 Jan 2022 Janek Ebbers, Romain Serizel, Reinhold Haeb-Umbach

Performing an adequate evaluation of sound event detection (SED) systems is far from trivial and is still subject to ongoing research.

Event Detection Sound Event Detection

The impact of non-target events in synthetic soundscapes for sound event detection

1 code implementation28 Sep 2021 Francesca Ronchini, Romain Serizel, Nicolas Turpault, Samuele Cornell

Detection and Classification Acoustic Scene and Events Challenge 2021 Task 4 uses a heterogeneous dataset that includes both recorded and synthetic soundscapes.

Event Detection Sound Event Detection

Attention-based distributed speech enhancement for unconstrained microphone arrays with varying number of nodes

1 code implementation15 Jun 2021 Nicolas Furnon, Romain Serizel, Slim Essid, Irina Illina

Speech enhancement promises higher efficiency in ad-hoc microphone arrays than in constrained microphone arrays thanks to the wide spatial coverage of the devices in the acoustic scene.

Speech Enhancement

DNN-based mask estimation for distributed speech enhancement in spatially unconstrained microphone arrays

1 code implementation3 Nov 2020 Nicolas Furnon, Romain Serizel, Irina Illina, Slim Essid

Deep neural network (DNN)-based speech enhancement algorithms in microphone arrays have now proven to be efficient solutions to speech understanding and speech recognition in noisy environments.

Noise Estimation Speech Enhancement +2

Distributed speech separation in spatially unconstrained microphone arrays

1 code implementation2 Nov 2020 Nicolas Furnon, Romain Serizel, Irina Illina, Slim Essid

We propose a distributed algorithm that can process spatial information in a spatially unconstrained microphone array.

Speech Separation

What's All the FUSS About Free Universal Sound Separation Data?

no code implementations2 Nov 2020 Scott Wisdom, Hakan Erdogan, Daniel Ellis, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Justin Salamon, Prem Seetharaman, John Hershey

We introduce the Free Universal Sound Separation (FUSS) dataset, a new corpus for experiments in separating mixtures of an unknown number of sounds from an open domain of sound types.

Data Augmentation

Improving Sound Event Detection Metrics: Insights from DCASE 2020

no code implementations26 Oct 2020 Giacomo Ferroni, Nicolas Turpault, Juan Azcarreta, Francesco Tuveri, Romain Serizel, Çagdaş Bilen, Sacha Krstulović

The ranking of sound event detection (SED) systems may be biased by assumptions inherent to evaluation criteria and to the choice of an operating point.

Event Detection Sound Event Detection +1

UIAI System for Short-Duration Speaker Verification Challenge 2020

no code implementations26 Jul 2020 Md Sahidullah, Achintya Kumar Sarkar, Ville Vestman, Xuechen Liu, Romain Serizel, Tomi Kinnunen, Zheng-Hua Tan, Emmanuel Vincent

Our primary submission to the challenge is the fusion of seven subsystems which yields a normalized minimum detection cost function (minDCF) of 0. 072 and an equal error rate (EER) of 2. 14% on the evaluation set.

Text-Dependent Speaker Verification

Foreground-Background Ambient Sound Scene Separation

no code implementations11 May 2020 Michel Olvera, Emmanuel Vincent, Romain Serizel, Gilles Gasso

Ambient sound scenes typically comprise multiple short events occurring on top of a somewhat stationary background.

DNN-Based Distributed Multichannel Mask Estimation for Speech Enhancement in Microphone Arrays

no code implementations13 Feb 2020 Nicolas Furnon, Romain Serizel, Irina Illina, Slim Essid

Multichannel processing is widely used for speech enhancement but several limitations appear when trying to deploy these solutions to the real-world.

Speech Enhancement

Limitations of weak labels for embedding and tagging

1 code implementation5 Feb 2020 Nicolas Turpault, Romain Serizel, Emmanuel Vincent

Many datasets and approaches in ambient sound analysis use weakly labeled data. Weak labels are employed because annotating every data sample with a strong label is too expensive. Yet, their impact on the performance in comparison to strong labels remains unclear. Indeed, weak labels must often be dealt with at the same time as other challenges, namely multiple labels per sample, unbalanced classes and/or overlapping events. In this paper, we formulate a supervised learning problem which involves weak labels. We create a dataset that focuses on the difference between strong and weak labels as opposed to other challenges.

Joint NN-Supported Multichannel Reduction of Acoustic Echo, Reverberation and Noise

no code implementations20 Nov 2019 Guillaume Carbajal, Romain Serizel, Emmanuel Vincent, Eric Humbert

We consider the problem of simultaneous reduction of acoustic echo, reverberation and noise.

Sound event detection in domestic environments withweakly labeled data and soundscape synthesis

1 code implementation26 Oct 2019 Nicolas Turpault, Romain Serizel, Ankit Shah, Justin Salamon

This paper presents Task 4 of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 challenge and provides a first analysis of the challenge results.

Event Detection Sound Event Detection

Rank-1 Constrained Multichannel Wiener Filter for Speech Recognition in Noisy Environments

1 code implementation1 Jul 2017 Ziteng Wang, Emmanuel Vincent, Romain Serizel, Yonghong Yan

Multichannel linear filters, such as the Multichannel Wiener Filter (MWF) and the Generalized Eigenvalue (GEV) beamformer are popular signal processing techniques which can improve speech recognition performance.

speech-recognition Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.