Search Results for author: Éva Székely

Found 12 papers, 4 papers with code

Evaluating Sampling-based Filler Insertion with Spontaneous TTS

no code implementations • LREC 2022 • Siyang Wang, Joakim Gustafson, Éva Székely

Perceptual results show little difference between compared filler insertion models including with ground-truth, which may be due to the ambiguity of what is good filler insertion and a strong neural spontaneous TTS that produces natural speech irrespective of input.

Paper
Add Code

Unified speech and gesture synthesis using flow matching

no code implementations • 8 Oct 2023 • Shivam Mehta, Ruibo Tu, Simon Alexanderson, Jonas Beskow, Éva Székely, Gustav Eje Henter

As text-to-speech technologies achieve remarkable naturalness in read-aloud tasks, there is growing interest in multimodal synthesis of verbal and non-verbal communicative behaviour, such as spontaneous speech and associated body gestures.

Ranked #1 on Motion Synthesis on Trinity Speech-Gesture Dataset

Audio Synthesis Motion Synthesis +1

Paper
Add Code

Matcha-TTS: A fast TTS architecture with conditional flow matching

1 code implementation • 6 Sep 2023 • Shivam Mehta, Ruibo Tu, Jonas Beskow, Éva Székely, Gustav Eje Henter

We introduce Matcha-TTS, a new encoder-decoder architecture for speedy TTS acoustic modelling, trained using optimal-transport conditional flow matching (OT-CFM).

Ranked #1 on Text-To-Speech Synthesis on LJSpeech (MOS metric)

Acoustic Modelling Speech Synthesis +1

381

Paper
Code

On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis

no code implementations • 11 Jul 2023 • Siyang Wang, Gustav Eje Henter, Joakim Gustafson, Éva Székely

Prior work has shown that SSL is an effective intermediate representation in two-stage text-to-speech (TTS) for both read and spontaneous speech.

Self-Supervised Learning Speech Synthesis

Paper
Add Code

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis

no code implementations • 15 Jun 2023 • Shivam Mehta, Siyang Wang, Simon Alexanderson, Jonas Beskow, Éva Székely, Gustav Eje Henter

With read-aloud speech synthesis achieving high naturalness scores, there is a growing research interest in synthesising spontaneous speech.

Denoising Speech Synthesis

Paper
Add Code

Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis

no code implementations • 29 May 2023 • Erik Ekstedt, Siyang Wang, Éva Székely, Joakim Gustafson, Gabriel Skantze

Turn-taking is a fundamental aspect of human communication where speakers convey their intention to either hold, or yield, their turn through prosodic cues.

Speech Synthesis

Paper
Add Code

A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS

no code implementations • 5 Mar 2023 • Siyang Wang, Gustav Eje Henter, Joakim Gustafson, Éva Székely

Recent work has explored using self-supervised learning (SSL) speech representations such as wav2vec2. 0 as the representation medium in standard two-stage TTS, in place of conventionally used mel-spectrograms.

Self-Supervised Learning

Paper
Add Code

Prosody-controllable spontaneous TTS with neural HMMs

no code implementations • 24 Nov 2022 • Harm Lameris, Shivam Mehta, Gustav Eje Henter, Joakim Gustafson, Éva Székely

Spontaneous speech has many affective and pragmatic functions that are interesting and challenging to model in TTS.

valid

Paper
Add Code

OverFlow: Putting flows on top of neural transducers for better TTS

2 code implementations • 13 Nov 2022 • Shivam Mehta, Ambika Kirkland, Harm Lameris, Jonas Beskow, Éva Székely, Gustav Eje Henter

Neural HMMs are a type of neural transducer recently proposed for sequence-to-sequence modelling in text-to-speech.

Ranked #11 on Text-To-Speech Synthesis on LJSpeech (using extra training data)

Normalising Flows Speech Synthesis +1

29,084

Paper
Code

Neural HMMs are all you need (for high-quality attention-free TTS)

2 code implementations • 30 Aug 2021 • Shivam Mehta, Éva Székely, Jonas Beskow, Gustav Eje Henter

Neural sequence-to-sequence TTS has achieved significantly better output quality than statistical speech synthesis using HMMs.

Ranked #3 on Speech Synthesis on LJSpeech

Speech Synthesis

29,084

Paper
Code

Integrated Speech and Gesture Synthesis

1 code implementation • 25 Aug 2021 • Siyang Wang, Simon Alexanderson, Joakim Gustafson, Jonas Beskow, Gustav Eje Henter, Éva Székely

Text-to-speech and co-speech gesture synthesis have until now been treated as separate areas by two different research communities, and applications merely stack the two technologies using a simple system-level pipeline.

Speech Synthesis

Paper
Code

Generating coherent spontaneous speech and gesture from text

no code implementations • 14 Jan 2021 • Simon Alexanderson, Éva Székely, Gustav Eje Henter, Taras Kucherenko, Jonas Beskow

In contrast to previous approaches for joint speech-and-gesture generation, we generate full-body gestures from speech synthesis trained on recordings of spontaneous speech from the same person as the motion-capture data.

Gesture Generation Speech Synthesis

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.