Search Results for author: Laurent Girin

Found 22 papers, 4 papers with code

Mixture of Dynamical Variational Autoencoders for Multi-Source Trajectory Modeling and Separation

no code implementations7 Dec 2023 Xiaoyu Lin, Laurent Girin, Xavier Alameda-Pineda

In this paper, we propose a latent-variable generative model called mixture of dynamical variational autoencoders (MixDVAE) to model the dynamics of a system composed of multiple moving sources.

Audio Source Separation Multi-Object Tracking +1

Unsupervised speech enhancement with deep dynamical generative speech and noise models

no code implementations13 Jun 2023 Xiaoyu Lin, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda

This work builds on a previous work on unsupervised speech enhancement using a dynamical variational autoencoder (DVAE) as the clean speech model and non-negative matrix factorization (NMF) as the noise model.

Speech Enhancement

A multimodal dynamical variational autoencoder for audiovisual speech representation learning

no code implementations5 May 2023 Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier

The latent space is structured to dissociate the latent dynamical factors that are shared between the modalities from those that are specific to each modality.

Disentanglement Image Denoising +2

Speech Modeling with a Hierarchical Transformer Dynamical VAE

no code implementations7 Mar 2023 Xiaoyu Lin, Xiaoyu Bie, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda

The dynamical variational autoencoders (DVAEs) are a family of latent-variable deep generative models that extends the VAE to model a sequence of observed data and a corresponding sequence of latent vectors.

Speech Enhancement

BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model

no code implementations4 Jul 2022 Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber

We collect a corpus of utterances containing contrastive focus and we evaluate the accuracy of a BERT model, finetuned to predict quantized acoustic prominence features, on these samples.

Language Modelling Speech Synthesis +1

Learning and controlling the source-filter representation of speech with a variational autoencoder

1 code implementation14 Apr 2022 Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier

Using only a few seconds of labeled speech signals generated with an artificial speech synthesizer, we propose a method to identify the latent subspaces encoding $f_0$ and the first three formant frequencies, we show that these subspaces are orthogonal, and based on this orthogonality, we develop a method to accurately and independently control the source-filter speech factors within the latent subspaces.

Repeat after me: Self-supervised learning of acoustic-to-articulatory mapping by vocal imitation

no code implementations5 Apr 2022 Marc-Antoine Georges, Julien Diard, Laurent Girin, Jean-Luc Schwartz, Thomas Hueber

We propose a computational model of speech production combining a pre-trained neural articulatory synthesizer able to reproduce complex speech stimuli from a limited set of interpretable articulatory parameters, a DNN-based internal forward model predicting the sensory consequences of articulatory commands, and an internal inverse model based on a recurrent neural network recovering articulatory commands from the acoustic speech input.

Self-Supervised Learning

Unsupervised Multiple-Object Tracking with a Dynamical Variational Autoencoder

no code implementations18 Feb 2022 Xiaoyu Lin, Laurent Girin, Xavier Alameda-Pineda

In this paper, we present an unsupervised probabilistic model and associated estimation algorithm for multi-object tracking (MOT) based on a dynamical variational autoencoder (DVAE), called DVAE-UMOT.

Multi-Object Tracking Multiple Object Tracking +3

A Survey of Sound Source Localization with Deep Learning Methods

no code implementations8 Sep 2021 Pierre-Amaury Grumiaux, Srđan Kitić, Laurent Girin, Alexandre Guérin

This article is a survey on deep learning methods for single and multiple sound source localization.

Unsupervised Speech Enhancement using Dynamical Variational Auto-Encoders

1 code implementation23 Jun 2021 Xiaoyu Bie, Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin

We propose an unsupervised speech enhancement algorithm that combines a DVAE speech prior pre-trained on clean speech signals with a noise model based on nonnegative matrix factorization, and we derive a variational expectation-maximization (VEM) algorithm to perform speech enhancement.

Representation Learning Speech Enhancement +2

What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS

no code implementations4 Sep 2020 Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber

In this paper, we study the behavior of a neural sequence-to-sequence TTS system when used in an incremental mode, i. e. when generating speech output for token n, the system has access to n + k tokens from the text sequence.

Sentence Speech Synthesis +1

Dynamical Variational Autoencoders: A Comprehensive Review

1 code implementation28 Aug 2020 Laurent Girin, Simon Leglaive, Xiaoyu Bie, Julien Diard, Thomas Hueber, Xavier Alameda-Pineda

Recently, a series of papers have presented different extensions of the VAE to process sequential data, which model not only the latent space but also the temporal dependencies within a sequence of data vectors and corresponding latent vectors, relying on recurrent neural networks or state-space models.

3D Human Dynamics Resynthesis +2

A Recurrent Variational Autoencoder for Speech Enhancement

no code implementations24 Oct 2019 Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud

This paper presents a generative approach to speech enhancement based on a recurrent variational autoencoder (RVAE).

Speech Enhancement

Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoders

no code implementations7 Aug 2019 Mostafa Sadeghi, Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud

Variational auto-encoders (VAEs) are deep generative latent variable models that can be used for learning the distribution of complex data.

Speech Enhancement

A variance modeling framework based on variational autoencoders for speech enhancement

1 code implementation5 Feb 2019 Simon Leglaive, Laurent Girin, Radu Horaud

In this paper we address the problem of enhancing speech signals in noisy mixtures using a source separation approach.

Speech Enhancement

Variational Bayesian Inference for Audio-Visual Tracking of Multiple Speakers

no code implementations28 Sep 2018 Yutong Ban, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud

We propose a variational inference model which amounts to approximate the joint distribution with a factorized distribution.

Bayesian Inference Variational Inference +1

Autoencoders for music sound modeling: a comparison of linear, shallow, deep, recurrent and variational models

no code implementations11 Jun 2018 Fanny Roche, Thomas Hueber, Samuel Limier, Laurent Girin

This study investigates the use of non-linear unsupervised dimensionality reduction techniques to compress a music dataset into a low-dimensional representation which can be used in turn for the synthesis of new sounds.

Audio and Speech Processing Sound

Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression

no code implementations12 Aug 2014 Antoine Deleforge, Radu Horaud, Yoav Schechner, Laurent Girin

Indeed, we demonstrate that the method can be used for audio-visual fusion, namely to map speech signals onto images and hence to spatially align the audio and visual modalities, thus enabling to discriminate between speaking and non-speaking faces.

regression

Cannot find the paper you are looking for? You can Submit a new open access paper.