no code implementations • 2 Feb 2024 • Simon Leglaive, Matthieu Fraticelli, Hend ElGhazaly, Léonie Borne, Mostafa Sadeghi, Scott Wisdom, Manuel Pariente, John R. Hershey, Daniel Pressnitzer, Jon P. Barker
In this paper, we present the objective and subjective evaluations of the systems that were submitted to the CHiME-7 UDASE task, and we provide an analysis of the results.
no code implementations • 13 Dec 2023 • Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Antonio Agudo, Francesc Moreno-Noguer
Instead of predicting body model parameters or 3D vertex coordinates, our focus is on forecasting the proposed discrete latent representation, which can be decoded into a registered human mesh.
no code implementations • 13 Jun 2023 • Xiaoyu Lin, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda
This work builds on a previous work on unsupervised speech enhancement using a dynamical variational autoencoder (DVAE) as the clean speech model and non-negative matrix factorization (NMF) as the noise model.
no code implementations • 9 Jun 2023 • Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Renaud Séguier
We introduce Motion-DVAE, a motion prior to capture the short-term dependencies of human motion.
no code implementations • 5 May 2023 • Samir Sadok, Simon Leglaive, Renaud Séguier
While fully-supervised models have been shown to be effective for audiovisual speech emotion recognition (SER), the limited availability of labeled data remains a major challenge in the field.
no code implementations • 5 May 2023 • Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier
The latent space is structured to dissociate the latent dynamical factors that are shared between the modalities from those that are specific to each modality.
no code implementations • 21 Apr 2023 • Samir Sadok, Simon Leglaive, Renaud Séguier
The VQ-MAE-S model is based on a masked autoencoder (MAE) that operates in the discrete latent space of a vector-quantized variational autoencoder.
Ranked #1 on Speech Emotion Recognition on EmoDB Dataset
no code implementations • 30 Mar 2023 • Matthieu Delmas, Amine Kacete, Stephane Paquelet, Simon Leglaive, Renaud Seguier
The proposed method leverages the structure of the latent space of StyleGAN to learn a lightweight classification model.
no code implementations • 7 Mar 2023 • Xiaoyu Lin, Xiaoyu Bie, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda
The dynamical variational autoencoders (DVAEs) are a family of latent-variable deep generative models that extends the VAE to model a sequence of observed data and a corresponding sequence of latent vectors.
1 code implementation • 14 Apr 2022 • Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier
Using only a few seconds of labeled speech signals generated with an artificial speech synthesizer, we propose a method to identify the latent subspaces encoding $f_0$ and the first three formant frequencies, we show that these subspaces are orthogonal, and based on this orthogonality, we develop a method to accurately and independently control the source-filter speech factors within the latent subspaces.
no code implementations • 4 Apr 2022 • Xiaoyu Bie, Wen Guo, Simon Leglaive, Lauren Girin, Francesc Moreno-Noguer, Xavier Alameda-Pineda
Studies on the automatic processing of 3D human pose data have flourished in the recent past.
1 code implementation • 23 Jun 2021 • Xiaoyu Bie, Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin
We propose an unsupervised speech enhancement algorithm that combines a DVAE speech prior pre-trained on clean speech signals with a noise model based on nonnegative matrix factorization, and we derive a variational expectation-maximization (VEM) algorithm to perform speech enhancement.
1 code implementation • 28 Aug 2020 • Laurent Girin, Simon Leglaive, Xiaoyu Bie, Julien Diard, Thomas Hueber, Xavier Alameda-Pineda
Recently, a series of papers have presented different extensions of the VAE to process sequential data, which model not only the latent space but also the temporal dependencies within a sequence of data vectors and corresponding latent vectors, relying on recurrent neural networks or state-space models.
no code implementations • 24 Oct 2019 • Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud
This paper presents a generative approach to speech enhancement based on a recurrent variational autoencoder (RVAE).
no code implementations • 7 Aug 2019 • Mostafa Sadeghi, Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud
Variational auto-encoders (VAEs) are deep generative latent variable models that can be used for learning the distribution of complex data.
no code implementations • 8 Feb 2019 • Simon Leglaive, Umut Simsekli, Antoine Liutkus, Laurent Girin, Radu Horaud
This paper focuses on single-channel semi-supervised speech enhancement.
1 code implementation • 5 Feb 2019 • Simon Leglaive, Laurent Girin, Radu Horaud
In this paper we address the problem of enhancing speech signals in noisy mixtures using a source separation approach.
no code implementations • 16 Nov 2018 • Simon Leglaive, Laurent Girin, Radu Horaud
In this paper we address speaker-independent multichannel speech enhancement in unknown noisy environments.