Search Results for author: Shivam Mehta

Found 8 papers, 3 papers with code

Unified speech and gesture synthesis using flow matching

no code implementations8 Oct 2023 Shivam Mehta, Ruibo Tu, Simon Alexanderson, Jonas Beskow, Éva Székely, Gustav Eje Henter

As text-to-speech technologies achieve remarkable naturalness in read-aloud tasks, there is growing interest in multimodal synthesis of verbal and non-verbal communicative behaviour, such as spontaneous speech and associated body gestures.

Audio Synthesis Motion Synthesis +1

Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation

no code implementations11 Sep 2023 Anna Deichler, Shivam Mehta, Simon Alexanderson, Jonas Beskow

The output of the CSMP module is used as a conditioning signal in the diffusion-based gesture synthesis model in order to achieve semantically-aware co-speech gesture generation.

Gesture Generation Motion Synthesis

Matcha-TTS: A fast TTS architecture with conditional flow matching

1 code implementation6 Sep 2023 Shivam Mehta, Ruibo Tu, Jonas Beskow, Éva Székely, Gustav Eje Henter

We introduce Matcha-TTS, a new encoder-decoder architecture for speedy TTS acoustic modelling, trained using optimal-transport conditional flow matching (OT-CFM).

 Ranked #1 on Text-To-Speech Synthesis on LJSpeech (MOS metric)

Acoustic Modelling Speech Synthesis +1

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis

no code implementations15 Jun 2023 Shivam Mehta, Siyang Wang, Simon Alexanderson, Jonas Beskow, Éva Székely, Gustav Eje Henter

With read-aloud speech synthesis achieving high naturalness scores, there is a growing research interest in synthesising spontaneous speech.

Denoising Speech Synthesis

Prosody-controllable spontaneous TTS with neural HMMs

no code implementations24 Nov 2022 Harm Lameris, Shivam Mehta, Gustav Eje Henter, Joakim Gustafson, Éva Székely

Spontaneous speech has many affective and pragmatic functions that are interesting and challenging to model in TTS.

valid

OverFlow: Putting flows on top of neural transducers for better TTS

2 code implementations13 Nov 2022 Shivam Mehta, Ambika Kirkland, Harm Lameris, Jonas Beskow, Éva Székely, Gustav Eje Henter

Neural HMMs are a type of neural transducer recently proposed for sequence-to-sequence modelling in text-to-speech.

Ranked #11 on Text-To-Speech Synthesis on LJSpeech (using extra training data)

Normalising Flows Speech Synthesis +1

Neural HMMs are all you need (for high-quality attention-free TTS)

2 code implementations30 Aug 2021 Shivam Mehta, Éva Székely, Jonas Beskow, Gustav Eje Henter

Neural sequence-to-sequence TTS has achieved significantly better output quality than statistical speech synthesis using HMMs.

Speech Synthesis

Cannot find the paper you are looking for? You can Submit a new open access paper.