Search Results for author: Simon Leglaive

Found 18 papers, 4 papers with code

Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge

no code implementations • 2 Feb 2024 • Simon Leglaive, Matthieu Fraticelli, Hend ElGhazaly, Léonie Borne, Mostafa Sadeghi, Scott Wisdom, Manuel Pariente, John R. Hershey, Daniel Pressnitzer, Jon P. Barker

In this paper, we present the objective and subjective evaluations of the systems that were submitted to the CHiME-7 UDASE task, and we provide an analysis of the results.

Speech Enhancement Unsupervised Domain Adaptation

Paper
Add Code

VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space

no code implementations • 13 Dec 2023 • Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Antonio Agudo, Francesc Moreno-Noguer

Instead of predicting body model parameters or 3D vertex coordinates, our focus is on forecasting the proposed discrete latent representation, which can be decoded into a registered human mesh.

Paper
Add Code

Unsupervised speech enhancement with deep dynamical generative speech and noise models

no code implementations • 13 Jun 2023 • Xiaoyu Lin, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda

This work builds on a previous work on unsupervised speech enhancement using a dynamical variational autoencoder (DVAE) as the clean speech model and non-negative matrix factorization (NMF) as the noise model.

Speech Enhancement

Paper
Add Code

Motion-DVAE: Unsupervised learning for fast human motion denoising

no code implementations • 9 Jun 2023 • Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Renaud Séguier

We introduce Motion-DVAE, a motion prior to capture the short-term dependencies of human motion.

3D Human Pose Estimation Denoising

Paper
Add Code

A vector quantized masked autoencoder for audiovisual speech emotion recognition

no code implementations • 5 May 2023 • Samir Sadok, Simon Leglaive, Renaud Séguier

While fully-supervised models have been shown to be effective for audiovisual speech emotion recognition (SER), the limited availability of labeled data remains a major challenge in the field.

Representation Learning Self-Supervised Learning +1

Paper
Add Code

A multimodal dynamical variational autoencoder for audiovisual speech representation learning

no code implementations • 5 May 2023 • Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier

The latent space is structured to dissociate the latent dynamical factors that are shared between the modalities from those that are specific to each modality.

Disentanglement Image Denoising +2

Paper
Add Code

A vector quantized masked autoencoder for speech emotion recognition

no code implementations • 21 Apr 2023 • Samir Sadok, Simon Leglaive, Renaud Séguier

The VQ-MAE-S model is based on a masked autoencoder (MAE) that operates in the discrete latent space of a vector-quantized variational autoencoder.

Ranked #1 on Speech Emotion Recognition on EmoDB Dataset

Self-Supervised Learning Speech Emotion Recognition

Paper
Add Code

LatentForensics: Towards lighter deepfake detection in the StyleGAN latent space

no code implementations • 30 Mar 2023 • Matthieu Delmas, Amine Kacete, Stephane Paquelet, Simon Leglaive, Renaud Seguier

The proposed method leverages the structure of the latent space of StyleGAN to learn a lightweight classification model.

Classification DeepFake Detection +2

Paper
Add Code

Speech Modeling with a Hierarchical Transformer Dynamical VAE

no code implementations • 7 Mar 2023 • Xiaoyu Lin, Xiaoyu Bie, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda

The dynamical variational autoencoders (DVAEs) are a family of latent-variable deep generative models that extends the VAE to model a sequence of observed data and a corresponding sequence of latent vectors.

Speech Enhancement

Paper
Add Code

Learning and controlling the source-filter representation of speech with a variational autoencoder

1 code implementation • 14 Apr 2022 • Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier

Using only a few seconds of labeled speech signals generated with an artificial speech synthesizer, we propose a method to identify the latent subspaces encoding $f_0$ and the first three formant frequencies, we show that these subspaces are orthogonal, and based on this orthogonality, we develop a method to accurately and independently control the source-filter speech factors within the latent subspaces.

Paper
Code

HiT-DVAE: Human Motion Generation via Hierarchical Transformer Dynamical VAE

no code implementations • 4 Apr 2022 • Xiaoyu Bie, Wen Guo, Simon Leglaive, Lauren Girin, Francesc Moreno-Noguer, Xavier Alameda-Pineda

Studies on the automatic processing of 3D human pose data have flourished in the recent past.

motion prediction

Paper
Add Code

Unsupervised Speech Enhancement using Dynamical Variational Auto-Encoders

1 code implementation • 23 Jun 2021 • Xiaoyu Bie, Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin

We propose an unsupervised speech enhancement algorithm that combines a DVAE speech prior pre-trained on clean speech signals with a noise model based on nonnegative matrix factorization, and we derive a variational expectation-maximization (VEM) algorithm to perform speech enhancement.

Representation Learning Speech Enhancement +2

Paper
Code

Dynamical Variational Autoencoders: A Comprehensive Review

1 code implementation • 28 Aug 2020 • Laurent Girin, Simon Leglaive, Xiaoyu Bie, Julien Diard, Thomas Hueber, Xavier Alameda-Pineda

Recently, a series of papers have presented different extensions of the VAE to process sequential data, which model not only the latent space but also the temporal dependencies within a sequence of data vectors and corresponding latent vectors, relying on recurrent neural networks or state-space models.

3D Human Dynamics Resynthesis +2