Search Results for author: Jonas Beskow

Found 23 papers, 9 papers with code

Speech Data Augmentation for Improving Phoneme Transcriptions of Aphasic Speech Using Wav2Vec 2.0 for the PSST Challenge

no code implementations • RaPID (LREC) 2022 • Birger Moell, Jim O’Regan, Shivam Mehta, Ambika Kirkland, Harm Lameris, Joakim Gustafson, Jonas Beskow

As part of the PSST challenge, we explore how data augmentations, data sources, and model size affect phoneme transcription accuracy on speech produced by individuals with aphasia.

Automatic Phoneme Recognition Data Augmentation +1

Paper
Add Code

Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis

no code implementations • 30 Apr 2024 • Shivam Mehta, Anna Deichler, Jim O'Regan, Birger Moëll, Jonas Beskow, Gustav Eje Henter, Simon Alexanderson

Specifically, we use unimodal synthesis models trained on large datasets to create multimodal (but synthetic) parallel training data, and then pre-train a joint synthesis model on that material.

Paper
Add Code

Unified speech and gesture synthesis using flow matching

no code implementations • 8 Oct 2023 • Shivam Mehta, Ruibo Tu, Simon Alexanderson, Jonas Beskow, Éva Székely, Gustav Eje Henter

As text-to-speech technologies achieve remarkable naturalness in read-aloud tasks, there is growing interest in multimodal synthesis of verbal and non-verbal communicative behaviour, such as spontaneous speech and associated body gestures.

Ranked #1 on Motion Synthesis on Trinity Speech-Gesture Dataset

Audio Synthesis Motion Synthesis +1

Paper
Add Code

Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation

no code implementations • 11 Sep 2023 • Anna Deichler, Shivam Mehta, Simon Alexanderson, Jonas Beskow

The output of the CSMP module is used as a conditioning signal in the diffusion-based gesture synthesis model in order to achieve semantically-aware co-speech gesture generation.

Gesture Generation Motion Synthesis

Paper
Add Code

Matcha-TTS: A fast TTS architecture with conditional flow matching

1 code implementation • 6 Sep 2023 • Shivam Mehta, Ruibo Tu, Jonas Beskow, Éva Székely, Gustav Eje Henter

We introduce Matcha-TTS, a new encoder-decoder architecture for speedy TTS acoustic modelling, trained using optimal-transport conditional flow matching (OT-CFM).

Ranked #1 on Text-To-Speech Synthesis on LJSpeech (MOS metric)

Acoustic Modelling Decoder +2

402

Paper
Code

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis

no code implementations • 15 Jun 2023 • Shivam Mehta, Siyang Wang, Simon Alexanderson, Jonas Beskow, Éva Székely, Gustav Eje Henter

With read-aloud speech synthesis achieving high naturalness scores, there is a growing research interest in synthesising spontaneous speech.

Denoising Speech Synthesis

Paper
Add Code

Listen, Denoise, Action! Audio-Driven Motion Synthesis with Diffusion Models

1 code implementation • 17 Nov 2022 • Simon Alexanderson, Rajmund Nagy, Jonas Beskow, Gustav Eje Henter

Diffusion models have experienced a surge of interest as highly expressive yet efficiently trainable probabilistic models.

Gesture Generation Motion Synthesis

126

Paper
Code

OverFlow: Putting flows on top of neural transducers for better TTS

2 code implementations • 13 Nov 2022 • Shivam Mehta, Ambika Kirkland, Harm Lameris, Jonas Beskow, Éva Székely, Gustav Eje Henter

Neural HMMs are a type of neural transducer recently proposed for sequence-to-sequence modelling in text-to-speech.

Ranked #11 on Text-To-Speech Synthesis on LJSpeech (using extra training data)

Normalising Flows Speech Synthesis +1

29,897

Paper
Code

Neural HMMs are all you need (for high-quality attention-free TTS)

2 code implementations • 30 Aug 2021 • Shivam Mehta, Éva Székely, Jonas Beskow, Gustav Eje Henter

Neural sequence-to-sequence TTS has achieved significantly better output quality than statistical speech synthesis using HMMs.

Ranked #3 on Speech Synthesis on LJSpeech

Speech Synthesis

29,897

Paper
Code

Integrated Speech and Gesture Synthesis

1 code implementation • 25 Aug 2021 • Siyang Wang, Simon Alexanderson, Joakim Gustafson, Jonas Beskow, Gustav Eje Henter, Éva Székely

Text-to-speech and co-speech gesture synthesis have until now been treated as separate areas by two different research communities, and applications merely stack the two technologies using a simple system-level pipeline.

Speech Synthesis

Paper
Code

Transflower: probabilistic autoregressive dance generation with multimodal attention

no code implementations • 25 Jun 2021 • Guillermo Valle-Pérez, Gustav Eje Henter, Jonas Beskow, André Holzapfel, Pierre-Yves Oudeyer, Simon Alexanderson

First, we present a novel probabilistic autoregressive architecture that models the distribution over future poses with a normalizing flow conditioned on previous poses as well as music context, using a multimodal transformer encoder.

Paper
Add Code

Generating coherent spontaneous speech and gesture from text

no code implementations • 14 Jan 2021 • Simon Alexanderson, Éva Székely, Gustav Eje Henter, Taras Kucherenko, Jonas Beskow

In contrast to previous approaches for joint speech-and-gesture generation, we generate full-body gestures from speech synthesis trained on recordings of spontaneous speech from the same person as the motion-capture data.

Gesture Generation Speech Synthesis

Paper
Add Code

Let's Face It: Probabilistic Multi-modal Interlocutor-aware Generation of Facial Gestures in Dyadic Settings

1 code implementation • 11 Jun 2020 • Patrik Jonell, Taras Kucherenko, Gustav Eje Henter, Jonas Beskow

Our contributions are: a) a method for feature extraction from multi-party video and speech recordings, resulting in a representation that allows for independent control and manipulation of expression and speech articulation in a 3D avatar; b) an extension to MoGlow, a recent motion-synthesis method based on normalizing flows, to also take multi-modal signals from the interlocutor as input and subsequently output interlocutor-aware facial gestures; and c) a subjective evaluation assessing the use and relative importance of the input modalities.

Motion Synthesis

Paper
Code

Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows

1 code implementation • Computer Graphics Forum 2020 • Simon Alexanderson, Gustav Eje Henter, Taras Kucherenko, Jonas Beskow

In interactive scenarios, systems for generating natural animations on the fly are key to achieving believable and relatable characters.

Gesture Generation Motion Synthesis +1

468

Paper
Code

MoGlow: Probabilistic and controllable motion synthesis using normalising flows

3 code implementations • 16 May 2019 • Gustav Eje Henter, Simon Alexanderson, Jonas Beskow

Data-driven modelling and synthesis of motion is an active research area with applications that include animation, games, and social robotics.

Motion Synthesis Normalising Flows

505

Paper
Code

Crowdsourced Multimodal Corpora Collection Tool

no code implementations • LREC 2018 • Patrik Jonell, Catharine Oertel, Dimosthenis Kontogiorgos, Jonas Beskow, Joakim Gustafson

Paper
Add Code

A Multimodal Corpus for Mutual Gaze and Joint Attention in Multiparty Situated Interaction

no code implementations • LREC 2018 • Dimosthenis Kontogiorgos, Vanya Avramova, Alex, Simon erson, Patrik Jonell, Catharine Oertel, Jonas Beskow, Gabriel Skantze, Joakim Gustafson

Mutual Gaze

Paper
Add Code

A Neural Network Approach to Missing Marker Reconstruction in Human Motion Capture

1 code implementation • 7 Mar 2018 • Taras Kucherenko, Jonas Beskow, Hedvig Kjellström

Optical motion capture systems have become a widely used technology in various fields, such as augmented reality, robotics, movie production, etc.

3D Reconstruction Missing Markers Reconstruction

Paper
Code

Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition

no code implementations • 24 Nov 2017 • Kalin Stefanov, Jonas Beskow, Giampiero Salvi

Active speaker detection is a fundamental prerequisite for any artificial cognitive system attempting to acquire language in social settings.

Language Acquisition

Paper
Add Code

Machine Learning and Social Robotics for Detecting Early Signs of Dementia

no code implementations • 5 Sep 2017 • Patrik Jonell, Joseph Mendelson, Thomas Storskog, Goran Hagman, Per Ostberg, Iolanda Leite, Taras Kucherenko, Olga Mikheeva, Ulrika Akenine, Vesna Jelic, Alina Solomon, Jonas Beskow, Joakim Gustafson, Miia Kivipelto, Hedvig Kjellstrom

This paper presents the EACare project, an ambitious multi-disciplinary collaboration with the aim to develop an embodied system, capable of carrying out neuropsychological tests to detect early signs of dementia, e. g., due to Alzheimer's disease.

BIG-bench Machine Learning