Search Results for author: Gustav Eje Henter

Found 28 papers, 16 papers with code

Prosody-controllable spontaneous TTS with neural HMMs

no code implementations24 Nov 2022 Harm Lameris, Shivam Mehta, Gustav Eje Henter, Joakim Gustafson, Éva Székely

To exemplify the power of combining mid-level prosody control and ecologically valid data for reproducing intricate spontaneous speech phenomena, we evaluate the system's capability of synthesizing two types of creaky phonation.

Listen, denoise, action! Audio-driven motion synthesis with diffusion models

no code implementations17 Nov 2022 Simon Alexanderson, Rajmund Nagy, Jonas Beskow, Gustav Eje Henter

Finally, we extend the guidance procedure to perform style interpolation in a manner that is appealing for synthesis tasks and has connections to product-of-experts models, a contribution we believe is of independent interest.

Gesture Generation Motion Synthesis

Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks

1 code implementation22 Sep 2022 Cassia Valentini-Botinhao, Manuel Sam Ribeiro, Oliver Watts, Korin Richmond, Gustav Eje Henter

While previous work has focused on predicting listeners' ratings (mean opinion scores) of individual stimuli, we focus on the simpler task of predicting subjective preference given two speech stimuli for the same text.

The GENEA Challenge 2022: A large evaluation of data-driven co-speech gesture generation

4 code implementations22 Aug 2022 Youngwoo Yoon, Pieter Wolfert, Taras Kucherenko, Carla Viegas, Teodor Nikolov, Mihail Tsakov, Gustav Eje Henter

On the other hand, all synthetic motion is found to be vastly less appropriate for the speech than the original motion-capture recordings.

Gesture Generation

Neural HMMs are all you need (for high-quality attention-free TTS)

1 code implementation30 Aug 2021 Shivam Mehta, Éva Székely, Jonas Beskow, Gustav Eje Henter

Neural sequence-to-sequence TTS has achieved significantly better output quality than statistical speech synthesis using HMMs.

Speech Synthesis

Integrated Speech and Gesture Synthesis

1 code implementation25 Aug 2021 Siyang Wang, Simon Alexanderson, Joakim Gustafson, Jonas Beskow, Gustav Eje Henter, Éva Székely

Text-to-speech and co-speech gesture synthesis have until now been treated as separate areas by two different research communities, and applications merely stack the two technologies using a simple system-level pipeline.

Speech Synthesis

Normalizing Flow based Hidden Markov Models for Classification of Speech Phones with Explainability

1 code implementation1 Jul 2021 Anubhab Ghosh, Antoine Honoré, Dong Liu, Gustav Eje Henter, Saikat Chatterjee

For a standard speech phone classification setup involving 39 phones (classes) and the TIMIT dataset, we show that the use of standard features called mel-frequency-cepstral-coeffcients (MFCCs), the proposed generative models, and the decision fusion together can achieve $86. 6\%$ accuracy by generative training only.


Transflower: probabilistic autoregressive dance generation with multimodal attention

no code implementations25 Jun 2021 Guillermo Valle-Pérez, Gustav Eje Henter, Jonas Beskow, André Holzapfel, Pierre-Yves Oudeyer, Simon Alexanderson

First, we present a novel probabilistic autoregressive architecture that models the distribution over future poses with a normalizing flow conditioned on previous poses as well as music context, using a multimodal transformer encoder.

The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models

1 code implementation ACL 2021 Ulme Wennberg, Gustav Eje Henter

In this paper, we analyze the position embeddings of existing language models, finding strong evidence of translation invariance, both for the embeddings themselves and for their effect on self-attention.


Generating coherent spontaneous speech and gesture from text

no code implementations14 Jan 2021 Simon Alexanderson, Éva Székely, Gustav Eje Henter, Taras Kucherenko, Jonas Beskow

In contrast to previous approaches for joint speech-and-gesture generation, we generate full-body gestures from speech synthesis trained on recordings of spontaneous speech from the same person as the motion-capture data.

Gesture Generation Speech Synthesis

Full-Glow: Fully conditional Glow for more realistic image generation

1 code implementation10 Dec 2020 Moein Sorkhei, Gustav Eje Henter, Hedvig Kjellström

Autonomous agents, such as driverless cars, require large amounts of labeled visual data for their training.

Image Generation Object Recognition +1

Moving fast and slow: Analysis of representations and post-processing in speech-driven automatic gesture generation

1 code implementation16 Jul 2020 Taras Kucherenko, Dai Hasegawa, Naoshi Kaneko, Gustav Eje Henter, Hedvig Kjellström

We provide an analysis of different representations for the input (speech) and the output (motion) of the network by both objective and subjective evaluations.

Gesture Generation Representation Learning

Robust model training and generalisation with Studentising flows

1 code implementation11 Jun 2020 Simon Alexanderson, Gustav Eje Henter

Normalising flows are tractable probabilistic models that leverage the power of deep learning to describe a wide parametric family of distributions, all while remaining trainable using maximum likelihood.

Normalising Flows

Let's Face It: Probabilistic Multi-modal Interlocutor-aware Generation of Facial Gestures in Dyadic Settings

1 code implementation11 Jun 2020 Patrik Jonell, Taras Kucherenko, Gustav Eje Henter, Jonas Beskow

Our contributions are: a) a method for feature extraction from multi-party video and speech recordings, resulting in a representation that allows for independent control and manipulation of expression and speech articulation in a 3D avatar; b) an extension to MoGlow, a recent motion-synthesis method based on normalizing flows, to also take multi-modal signals from the interlocutor as input and subsequently output interlocutor-aware facial gestures; and c) a subjective evaluation assessing the use and relative importance of the input modalities.

Motion Synthesis

Transformation of low-quality device-recorded speech to high-quality speech using improved SEGAN model

1 code implementation10 Nov 2019 Seyyed Saeed Sarfjoo, Xin Wang, Gustav Eje Henter, Jaime Lorenzo-Trueba, Shinji Takaki, Junichi Yamagishi

Nowadays vast amounts of speech data are recorded from low-quality recorder devices such as smartphones, tablets, laptops, and medium-quality microphones.

Sound Audio and Speech Processing

MoGlow: Probabilistic and controllable motion synthesis using normalising flows

3 code implementations16 May 2019 Gustav Eje Henter, Simon Alexanderson, Jonas Beskow

Data-driven modelling and synthesis of motion is an active research area with applications that include animation, games, and social robotics.

Motion Synthesis Normalising Flows

Analyzing Input and Output Representations for Speech-Driven Gesture Generation

1 code implementation arXiv 2019 Taras Kucherenko, Dai Hasegawa, Gustav Eje Henter, Naoshi Kaneko, Hedvig Kjellström

We evaluate different representation sizes in order to find the most effective dimensionality for the representation.

Gesture Generation Human-Computer Interaction I.2.6; I.5.1; J.4

Kernel Density Estimation-Based Markov Models with Hidden State

no code implementations30 Jul 2018 Gustav Eje Henter, Arne Leijon, W. Bastiaan Kleijn

We consider Markov models of stochastic processes where the next-step conditional distribution is defined by a kernel density estimator (KDE), similar to Markov forecast densities and certain time-series bootstrap schemes.

Density Estimation Time Series

Consensus-based Sequence Training for Video Captioning

no code implementations27 Dec 2017 Sang Phan, Gustav Eje Henter, Yusuke Miyao, Shin'ichi Satoh

First we show that, by replacing model samples with ground-truth sentences, RL training can be seen as a form of weighted cross-entropy loss, giving a fast, RL-based pre-training algorithm.

Video Captioning

Median-Based Generation of Synthetic Speech Durations using a Non-Parametric Approach

no code implementations22 Aug 2016 Srikanth Ronanki, Oliver Watts, Simon King, Gustav Eje Henter

This paper proposes a new approach to duration modelling for statistical parametric speech synthesis in which a recurrent statistical model is trained to output a phone transition probability at each timestep (acoustic frame).

Speech Synthesis

Cannot find the paper you are looking for? You can Submit a new open access paper.