no code implementations • 8 Oct 2023 • Shivam Mehta, Ruibo Tu, Simon Alexanderson, Jonas Beskow, Éva Székely, Gustav Eje Henter
As text-to-speech technologies achieve remarkable naturalness in read-aloud tasks, there is growing interest in multimodal synthesis of verbal and non-verbal communicative behaviour, such as spontaneous speech and associated body gestures.
Ranked #1 on Motion Synthesis on Trinity Speech-Gesture Dataset
no code implementations • 11 Sep 2023 • Anna Deichler, Shivam Mehta, Simon Alexanderson, Jonas Beskow
The output of the CSMP module is used as a conditioning signal in the diffusion-based gesture synthesis model in order to achieve semantically-aware co-speech gesture generation.
no code implementations • 15 Jun 2023 • Shivam Mehta, Siyang Wang, Simon Alexanderson, Jonas Beskow, Éva Székely, Gustav Eje Henter
With read-aloud speech synthesis achieving high naturalness scores, there is a growing research interest in synthesising spontaneous speech.
1 code implementation • 17 Nov 2022 • Simon Alexanderson, Rajmund Nagy, Jonas Beskow, Gustav Eje Henter
Diffusion models have experienced a surge of interest as highly expressive yet efficiently trainable probabilistic models.
1 code implementation • 25 Aug 2021 • Siyang Wang, Simon Alexanderson, Joakim Gustafson, Jonas Beskow, Gustav Eje Henter, Éva Székely
Text-to-speech and co-speech gesture synthesis have until now been treated as separate areas by two different research communities, and applications merely stack the two technologies using a simple system-level pipeline.
no code implementations • 25 Jun 2021 • Guillermo Valle-Pérez, Gustav Eje Henter, Jonas Beskow, André Holzapfel, Pierre-Yves Oudeyer, Simon Alexanderson
First, we present a novel probabilistic autoregressive architecture that models the distribution over future poses with a normalizing flow conditioned on previous poses as well as music context, using a multimodal transformer encoder.
no code implementations • 14 Jan 2021 • Simon Alexanderson, Éva Székely, Gustav Eje Henter, Taras Kucherenko, Jonas Beskow
In contrast to previous approaches for joint speech-and-gesture generation, we generate full-body gestures from speech synthesis trained on recordings of spontaneous speech from the same person as the motion-capture data.
1 code implementation • 11 Jun 2020 • Simon Alexanderson, Gustav Eje Henter
Normalising flows are tractable probabilistic models that leverage the power of deep learning to describe a wide parametric family of distributions, all while remaining trainable using maximum likelihood.
1 code implementation • Computer Graphics Forum 2020 • Simon Alexanderson, Gustav Eje Henter, Taras Kucherenko, Jonas Beskow
In interactive scenarios, systems for generating natural animations on the fly are key to achieving believable and relatable characters.
1 code implementation • 25 Jan 2020 • Taras Kucherenko, Patrik Jonell, Sanne van Waveren, Gustav Eje Henter, Simon Alexanderson, Iolanda Leite, Hedvig Kjellström
During speech, people spontaneously gesticulate, which plays a key role in conveying information.
3 code implementations • 16 May 2019 • Gustav Eje Henter, Simon Alexanderson, Jonas Beskow
Data-driven modelling and synthesis of motion is an active research area with applications that include animation, games, and social robotics.