no code implementations • 14 Jun 2023 • Sanjana Sankar, Denis Beautemps, Frédéric Elisei, Olivier Perrotin, Thomas Hueber
Along with the release of this dataset, a benchmark will be reported for word-level recognition, a novelty in the automatic recognition of French CS.
no code implementations • 4 Jul 2022 • Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber
We collect a corpus of utterances containing contrastive focus and we evaluate the accuracy of a BERT model, finetuned to predict quantized acoustic prominence features, on these samples.
no code implementations • 17 Jun 2022 • Marc-Antoine Georges, Jean-Luc Schwartz, Thomas Hueber
The human perception system is often assumed to recruit motor knowledge when processing auditory speech inputs.
no code implementations • 11 Apr 2022 • Sanjana Sankar, Denis Beautemps, Thomas Hueber
This paper proposes a simple and effective approach for automatic recognition of Cued Speech (CS), a visual communication tool that helps people with hearing impairment to understand spoken language with the help of hand gestures that can uniquely identify the uttered phonemes in complement to lipreading.
no code implementations • 5 Apr 2022 • Marc-Antoine Georges, Julien Diard, Laurent Girin, Jean-Luc Schwartz, Thomas Hueber
We propose a computational model of speech production combining a pre-trained neural articulatory synthesizer able to reproduce complex speech stimuli from a limited set of interpretable articulatory parameters, a DNN-based internal forward model predicting the sensory consequences of articulatory commands, and an internal inverse model based on a recurrent neural network recovering articulatory commands from the acoustic speech input.
no code implementations • 7 Apr 2021 • Marc-Antoine Georges, Laurent Girin, Jean-Luc Schwartz, Thomas Hueber
It is increasingly considered that human speech perception and production both rely on articulatory representations.
no code implementations • 19 Feb 2021 • Brooke Stephenson, Thomas Hueber, Laurent Girin, Laurent Besacier
The prosody of a spoken word is determined by its surrounding context.
no code implementations • 4 Sep 2020 • Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber
In this paper, we study the behavior of a neural sequence-to-sequence TTS system when used in an incremental mode, i. e. when generating speech output for token n, the system has access to n + k tokens from the text sequence.
1 code implementation • 28 Aug 2020 • Laurent Girin, Simon Leglaive, Xiaoyu Bie, Julien Diard, Thomas Hueber, Xavier Alameda-Pineda
Recently, a series of papers have presented different extensions of the VAE to process sequential data, which model not only the latent space but also the temporal dependencies within a sequence of data vectors and corresponding latent vectors, relying on recurrent neural networks or state-space models.
no code implementations • 11 Jun 2018 • Fanny Roche, Thomas Hueber, Samuel Limier, Laurent Girin
This study investigates the use of non-linear unsupervised dimensionality reduction techniques to compress a music dataset into a low-dimensional representation which can be used in turn for the synthesis of new sounds.
Audio and Speech Processing Sound
no code implementations • JEPTALNRECITAL 2012 • Thomas Hueber, Atef Ben-Youssef, Pierre Badin, G{\'e}rard Bailly, Fr{\'e}d{\'e}ric Elis{\'e}i