no code implementations • RaPID (LREC) 2022 • Birger Moell, Jim O’Regan, Shivam Mehta, Ambika Kirkland, Harm Lameris, Joakim Gustafson, Jonas Beskow
As part of the PSST challenge, we explore how data augmentations, data sources, and model size affect phoneme transcription accuracy on speech produced by individuals with aphasia.
no code implementations • 8 Oct 2023 • Shivam Mehta, Ruibo Tu, Simon Alexanderson, Jonas Beskow, Éva Székely, Gustav Eje Henter
As text-to-speech technologies achieve remarkable naturalness in read-aloud tasks, there is growing interest in multimodal synthesis of verbal and non-verbal communicative behaviour, such as spontaneous speech and associated body gestures.
Ranked #1 on Motion Synthesis on Trinity Speech-Gesture Dataset
no code implementations • 11 Sep 2023 • Anna Deichler, Shivam Mehta, Simon Alexanderson, Jonas Beskow
The output of the CSMP module is used as a conditioning signal in the diffusion-based gesture synthesis model in order to achieve semantically-aware co-speech gesture generation.
1 code implementation • 6 Sep 2023 • Shivam Mehta, Ruibo Tu, Jonas Beskow, Éva Székely, Gustav Eje Henter
We introduce Matcha-TTS, a new encoder-decoder architecture for speedy TTS acoustic modelling, trained using optimal-transport conditional flow matching (OT-CFM).
Ranked #1 on Text-To-Speech Synthesis on LJSpeech (MOS metric)
no code implementations • 15 Jun 2023 • Shivam Mehta, Siyang Wang, Simon Alexanderson, Jonas Beskow, Éva Székely, Gustav Eje Henter
With read-aloud speech synthesis achieving high naturalness scores, there is a growing research interest in synthesising spontaneous speech.
no code implementations • 24 Nov 2022 • Harm Lameris, Shivam Mehta, Gustav Eje Henter, Joakim Gustafson, Éva Székely
Spontaneous speech has many affective and pragmatic functions that are interesting and challenging to model in TTS.
2 code implementations • 13 Nov 2022 • Shivam Mehta, Ambika Kirkland, Harm Lameris, Jonas Beskow, Éva Székely, Gustav Eje Henter
Neural HMMs are a type of neural transducer recently proposed for sequence-to-sequence modelling in text-to-speech.
Ranked #11 on Text-To-Speech Synthesis on LJSpeech (using extra training data)
2 code implementations • 30 Aug 2021 • Shivam Mehta, Éva Székely, Jonas Beskow, Gustav Eje Henter
Neural sequence-to-sequence TTS has achieved significantly better output quality than statistical speech synthesis using HMMs.
Ranked #3 on Speech Synthesis on LJSpeech