Search Results for author: Shivam Mehta

Found 8 papers, 3 papers with code

Speech Data Augmentation for Improving Phoneme Transcriptions of Aphasic Speech Using Wav2Vec 2.0 for the PSST Challenge

no code implementations • RaPID (LREC) 2022 • Birger Moell, Jim O’Regan, Shivam Mehta, Ambika Kirkland, Harm Lameris, Joakim Gustafson, Jonas Beskow

As part of the PSST challenge, we explore how data augmentations, data sources, and model size affect phoneme transcription accuracy on speech produced by individuals with aphasia.

Automatic Phoneme Recognition Data Augmentation +1

Paper
Add Code

Unified speech and gesture synthesis using flow matching

no code implementations • 8 Oct 2023 • Shivam Mehta, Ruibo Tu, Simon Alexanderson, Jonas Beskow, Éva Székely, Gustav Eje Henter

As text-to-speech technologies achieve remarkable naturalness in read-aloud tasks, there is growing interest in multimodal synthesis of verbal and non-verbal communicative behaviour, such as spontaneous speech and associated body gestures.

Ranked #1 on Motion Synthesis on Trinity Speech-Gesture Dataset

Audio Synthesis Motion Synthesis +1

Paper
Add Code

Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation

no code implementations • 11 Sep 2023 • Anna Deichler, Shivam Mehta, Simon Alexanderson, Jonas Beskow

The output of the CSMP module is used as a conditioning signal in the diffusion-based gesture synthesis model in order to achieve semantically-aware co-speech gesture generation.

Gesture Generation Motion Synthesis

Paper
Add Code

Matcha-TTS: A fast TTS architecture with conditional flow matching

1 code implementation • 6 Sep 2023 • Shivam Mehta, Ruibo Tu, Jonas Beskow, Éva Székely, Gustav Eje Henter

We introduce Matcha-TTS, a new encoder-decoder architecture for speedy TTS acoustic modelling, trained using optimal-transport conditional flow matching (OT-CFM).

Ranked #1 on Text-To-Speech Synthesis on LJSpeech (MOS metric)