no code implementations • 13 Jun 2024 • Neha Sahipjohn, Ashishkumar Gudmalwar, Nirmesh Shah, Pankaj Wasnik, Rajiv Ratn Shah
To this end, we propose a novel method, DubWise Multi-modal Large Language Model (LLM)-based Text-to-Speech (TTS), which can control the speech duration of synthesized speech in such a way that it aligns well with the speakers lip movements given in the reference video even when the spoken text is different or in a different language.
no code implementations • 3 Jul 2023 • Neha Sahipjohn, Neil Shah, Vishal Tambrahalli, Vineet Gandhi
Significant progress has been made in speaker dependent Lip-to-Speech synthesis, which aims to generate speech from silent videos of talking faces.
no code implementations • 1 Mar 2023 • Neil Shah, Saiteja Kosgi, Vishal Tambrahalli, Neha Sahipjohn, Niranjan Pedanekar, Vineet Gandhi
We present ParrotTTS, a modularized text-to-speech synthesis model leveraging disentangled self-supervised speech representations.