Search Results for author: Laurent Girin

Found 22 papers, 4 papers with code

Mixture of Dynamical Variational Autoencoders for Multi-Source Trajectory Modeling and Separation

no code implementations • 7 Dec 2023 • Xiaoyu Lin, Laurent Girin, Xavier Alameda-Pineda

In this paper, we propose a latent-variable generative model called mixture of dynamical variational autoencoders (MixDVAE) to model the dynamics of a system composed of multiple moving sources.

Audio Source Separation Multi-Object Tracking +1

Paper
Add Code

Unsupervised speech enhancement with deep dynamical generative speech and noise models

no code implementations • 13 Jun 2023 • Xiaoyu Lin, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda

This work builds on a previous work on unsupervised speech enhancement using a dynamical variational autoencoder (DVAE) as the clean speech model and non-negative matrix factorization (NMF) as the noise model.

Speech Enhancement

Paper
Add Code

A multimodal dynamical variational autoencoder for audiovisual speech representation learning

no code implementations • 5 May 2023 • Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier

The latent space is structured to dissociate the latent dynamical factors that are shared between the modalities from those that are specific to each modality.

Disentanglement Image Denoising +2

Paper
Add Code

Speech Modeling with a Hierarchical Transformer Dynamical VAE

no code implementations • 7 Mar 2023 • Xiaoyu Lin, Xiaoyu Bie, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda

The dynamical variational autoencoders (DVAEs) are a family of latent-variable deep generative models that extends the VAE to model a sequence of observed data and a corresponding sequence of latent vectors.

Speech Enhancement

Paper
Add Code

BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model

no code implementations • 4 Jul 2022 • Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber

We collect a corpus of utterances containing contrastive focus and we evaluate the accuracy of a BERT model, finetuned to predict quantized acoustic prominence features, on these samples.

Language Modelling Speech Synthesis +1

Paper
Add Code

Learning and controlling the source-filter representation of speech with a variational autoencoder

1 code implementation • 14 Apr 2022 • Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier

Using only a few seconds of labeled speech signals generated with an artificial speech synthesizer, we propose a method to identify the latent subspaces encoding $f_0$ and the first three formant frequencies, we show that these subspaces are orthogonal, and based on this orthogonality, we develop a method to accurately and independently control the source-filter speech factors within the latent subspaces.

Paper
Code

Repeat after me: Self-supervised learning of acoustic-to-articulatory mapping by vocal imitation

no code implementations • 5 Apr 2022 • Marc-Antoine Georges, Julien Diard, Laurent Girin, Jean-Luc Schwartz, Thomas Hueber

We propose a computational model of speech production combining a pre-trained neural articulatory synthesizer able to reproduce complex speech stimuli from a limited set of interpretable articulatory parameters, a DNN-based internal forward model predicting the sensory consequences of articulatory commands, and an internal inverse model based on a recurrent neural network recovering articulatory commands from the acoustic speech input.

Self-Supervised Learning

Paper
Add Code

Unsupervised Multiple-Object Tracking with a Dynamical Variational Autoencoder

no code implementations • 18 Feb 2022 • Xiaoyu Lin, Laurent Girin, Xavier Alameda-Pineda

In this paper, we present an unsupervised probabilistic model and associated estimation algorithm for multi-object tracking (MOT) based on a dynamical variational autoencoder (DVAE), called DVAE-UMOT.

Multi-Object Tracking Multiple Object Tracking +3

Paper
Add Code

A Survey of Sound Source Localization with Deep Learning Methods

no code implementations • 8 Sep 2021 • Pierre-Amaury Grumiaux, Srđan Kitić, Laurent Girin, Alexandre Guérin

This article is a survey on deep learning methods for single and multiple sound source localization.

Paper
Add Code

Unsupervised Speech Enhancement using Dynamical Variational Auto-Encoders

1 code implementation • 23 Jun 2021 • Xiaoyu Bie, Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin

We propose an unsupervised speech enhancement algorithm that combines a DVAE speech prior pre-trained on clean speech signals with a noise model based on nonnegative matrix factorization, and we derive a variational expectation-maximization (VEM) algorithm to perform speech enhancement.

Representation Learning Speech Enhancement +2

Paper
Code

Learning robust speech representation with an articulatory-regularized variational autoencoder

no code implementations • 7 Apr 2021 • Marc-Antoine Georges, Laurent Girin, Jean-Luc Schwartz, Thomas Hueber

It is increasingly considered that human speech perception and production both rely on articulatory representations.

Denoising Speech Denoising

Paper
Add Code

Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input

no code implementations • 19 Feb 2021 • Brooke Stephenson, Thomas Hueber, Laurent Girin, Laurent Besacier

The prosody of a spoken word is determined by its surrounding context.

Language Modelling Speech Synthesis +1

Paper
Add Code

What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS

no code implementations • 4 Sep 2020 • Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber

In this paper, we study the behavior of a neural sequence-to-sequence TTS system when used in an incremental mode, i. e. when generating speech output for token n, the system has access to n + k tokens from the text sequence.

Sentence Speech Synthesis +1

Paper
Add Code

Dynamical Variational Autoencoders: A Comprehensive Review

1 code implementation • 28 Aug 2020 • Laurent Girin, Simon Leglaive, Xiaoyu Bie, Julien Diard, Thomas Hueber, Xavier Alameda-Pineda

Recently, a series of papers have presented different extensions of the VAE to process sequential data, which model not only the latent space but also the temporal dependencies within a sequence of data vectors and corresponding latent vectors, relying on recurrent neural networks or state-space models.

3D Human Dynamics Resynthesis +2

194

Paper
Code

A Recurrent Variational Autoencoder for Speech Enhancement

no code implementations • 24 Oct 2019 • Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud

This paper presents a generative approach to speech enhancement based on a recurrent variational autoencoder (RVAE).

Speech Enhancement

Paper
Add Code

Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoders

no code implementations • 7 Aug 2019 • Mostafa Sadeghi, Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud

Variational auto-encoders (VAEs) are deep generative latent variable models that can be used for learning the distribution of complex data.

Speech Enhancement

Paper
Add Code

Speech enhancement with variational autoencoders and alpha-stable distributions

no code implementations • 8 Feb 2019 • Simon Leglaive, Umut Simsekli, Antoine Liutkus, Laurent Girin, Radu Horaud

This paper focuses on single-channel semi-supervised speech enhancement.

Speech Enhancement

Paper
Add Code

A variance modeling framework based on variational autoencoders for speech enhancement

1 code implementation • 5 Feb 2019 • Simon Leglaive, Laurent Girin, Radu Horaud

In this paper we address the problem of enhancing speech signals in noisy mixtures using a source separation approach.

Speech Enhancement

Paper
Code

Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization

no code implementations • 16 Nov 2018 • Simon Leglaive, Laurent Girin, Radu Horaud

In this paper we address speaker-independent multichannel speech enhancement in unknown noisy environments.

Speech Enhancement

Paper
Add Code

Variational Bayesian Inference for Audio-Visual Tracking of Multiple Speakers

no code implementations • 28 Sep 2018 • Yutong Ban, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud

We propose a variational inference model which amounts to approximate the joint distribution with a factorized distribution.

Bayesian Inference Variational Inference +1

Paper
Add Code

Autoencoders for music sound modeling: a comparison of linear, shallow, deep, recurrent and variational models

no code implementations • 11 Jun 2018 • Fanny Roche, Thomas Hueber, Samuel Limier, Laurent Girin

This study investigates the use of non-linear unsupervised dimensionality reduction techniques to compress a music dataset into a low-dimensional representation which can be used in turn for the synthesis of new sounds.

Audio and Speech Processing Sound

Paper
Add Code

Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression

no code implementations • 12 Aug 2014 • Antoine Deleforge, Radu Horaud, Yoav Schechner, Laurent Girin

Indeed, we demonstrate that the method can be used for audio-visual fusion, namely to map speech signals onto images and hence to spatially align the audio and visual modalities, thus enabling to discriminate between speaking and non-speaking faces.

regression

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.