no code implementations • 9 Jan 2025 • Samir Sadok, Simon Leglaive, Laurent Girin, Gaël Richard, Xavier Alameda-Pineda
This article introduces AnCoGen, a novel method that leverages a masked autoencoder to unify the analysis, control, and generation of speech signals within a single model.
no code implementations • 30 May 2024 • Ihab Asaad, Maxime Jacquelin, Olivier Perrotin, Laurent Girin, Thomas Hueber
In the present study, we investigate the use of a speech SSL model for speech inpainting, that is reconstructing a missing portion of a speech signal from its surrounding context, i. e., fulfilling a downstream task that is very similar to the pretext task.
no code implementations • 7 Dec 2023 • Xiaoyu Lin, Laurent Girin, Xavier Alameda-Pineda
In this paper, we propose a latent-variable generative model called mixture of dynamical variational autoencoders (MixDVAE) to model the dynamics of a system composed of multiple moving sources.
no code implementations • 13 Jun 2023 • Xiaoyu Lin, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda
This work builds on a previous work on unsupervised speech enhancement using a dynamical variational autoencoder (DVAE) as the clean speech model and non-negative matrix factorization (NMF) as the noise model.
no code implementations • 5 May 2023 • Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier
The latent space is structured to dissociate the latent dynamical factors that are shared between the modalities from those that are specific to each modality.
no code implementations • 7 Mar 2023 • Xiaoyu Lin, Xiaoyu Bie, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda
The dynamical variational autoencoders (DVAEs) are a family of latent-variable deep generative models that extends the VAE to model a sequence of observed data and a corresponding sequence of latent vectors.
no code implementations • 4 Jul 2022 • Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber
We collect a corpus of utterances containing contrastive focus and we evaluate the accuracy of a BERT model, finetuned to predict quantized acoustic prominence features, on these samples.
1 code implementation • 14 Apr 2022 • Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier
Using only a few seconds of labeled speech signals generated with an artificial speech synthesizer, we propose a method to identify the latent subspaces encoding $f_0$ and the first three formant frequencies, we show that these subspaces are orthogonal, and based on this orthogonality, we develop a method to accurately and independently control the source-filter speech factors within the latent subspaces.
no code implementations • 5 Apr 2022 • Marc-Antoine Georges, Julien Diard, Laurent Girin, Jean-Luc Schwartz, Thomas Hueber
We propose a computational model of speech production combining a pre-trained neural articulatory synthesizer able to reproduce complex speech stimuli from a limited set of interpretable articulatory parameters, a DNN-based internal forward model predicting the sensory consequences of articulatory commands, and an internal inverse model based on a recurrent neural network recovering articulatory commands from the acoustic speech input.
no code implementations • 18 Feb 2022 • Xiaoyu Lin, Laurent Girin, Xavier Alameda-Pineda
In this paper, we present an unsupervised probabilistic model and associated estimation algorithm for multi-object tracking (MOT) based on a dynamical variational autoencoder (DVAE), called DVAE-UMOT.
no code implementations • 8 Sep 2021 • Pierre-Amaury Grumiaux, Srđan Kitić, Laurent Girin, Alexandre Guérin
This article is a survey on deep learning methods for single and multiple sound source localization.
1 code implementation • 23 Jun 2021 • Xiaoyu Bie, Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin
We propose an unsupervised speech enhancement algorithm that combines a DVAE speech prior pre-trained on clean speech signals with a noise model based on nonnegative matrix factorization, and we derive a variational expectation-maximization (VEM) algorithm to perform speech enhancement.
no code implementations • 7 Apr 2021 • Marc-Antoine Georges, Laurent Girin, Jean-Luc Schwartz, Thomas Hueber
It is increasingly considered that human speech perception and production both rely on articulatory representations.
no code implementations • 19 Feb 2021 • Brooke Stephenson, Thomas Hueber, Laurent Girin, Laurent Besacier
The prosody of a spoken word is determined by its surrounding context.
no code implementations • 4 Sep 2020 • Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber
In this paper, we study the behavior of a neural sequence-to-sequence TTS system when used in an incremental mode, i. e. when generating speech output for token n, the system has access to n + k tokens from the text sequence.
1 code implementation • 28 Aug 2020 • Laurent Girin, Simon Leglaive, Xiaoyu Bie, Julien Diard, Thomas Hueber, Xavier Alameda-Pineda
Recently, a series of papers have presented different extensions of the VAE to process sequential data, which model not only the latent space but also the temporal dependencies within a sequence of data vectors and corresponding latent vectors, relying on recurrent neural networks or state-space models.
no code implementations • 24 Oct 2019 • Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud
This paper presents a generative approach to speech enhancement based on a recurrent variational autoencoder (RVAE).
no code implementations • 7 Aug 2019 • Mostafa Sadeghi, Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud
Variational auto-encoders (VAEs) are deep generative latent variable models that can be used for learning the distribution of complex data.
no code implementations • 8 Feb 2019 • Simon Leglaive, Umut Simsekli, Antoine Liutkus, Laurent Girin, Radu Horaud
This paper focuses on single-channel semi-supervised speech enhancement.
1 code implementation • 5 Feb 2019 • Simon Leglaive, Laurent Girin, Radu Horaud
In this paper we address the problem of enhancing speech signals in noisy mixtures using a source separation approach.
no code implementations • 16 Nov 2018 • Simon Leglaive, Laurent Girin, Radu Horaud
In this paper we address speaker-independent multichannel speech enhancement in unknown noisy environments.
no code implementations • 28 Sep 2018 • Yutong Ban, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud
We propose a variational inference model which amounts to approximate the joint distribution with a factorized distribution.
no code implementations • 11 Jun 2018 • Fanny Roche, Thomas Hueber, Samuel Limier, Laurent Girin
This study investigates the use of non-linear unsupervised dimensionality reduction techniques to compress a music dataset into a low-dimensional representation which can be used in turn for the synthesis of new sounds.
Audio and Speech Processing Sound
no code implementations • 12 Aug 2014 • Antoine Deleforge, Radu Horaud, Yoav Schechner, Laurent Girin
Indeed, we demonstrate that the method can be used for audio-visual fusion, namely to map speech signals onto images and hence to spatially align the audio and visual modalities, thus enabling to discriminate between speaking and non-speaking faces.