no code implementations • 2 Mar 2024 • Sindhu Hegde, Rudrabha Mukhopadhyay, C. V. Jawahar, Vinay Namboodiri
In this paper, we introduce a novel approach to address the task of synthesizing speech from silent videos of any in-the-wild speaker solely based on lip movements.
1 code implementation • 7 Oct 2022 • Madhav Agarwal, Anchit Gupta, Rudrabha Mukhopadhyay, Vinay P. Namboodiri, C V Jawahar
We use a state-of-the-art face reenactment network to detect key points in the non-pivot frames and transmit them to the receiver.
1 code implementation • 6 Oct 2022 • Madhav Agarwal, Rudrabha Mukhopadhyay, Vinay Namboodiri, C V Jawahar
The identity-aware generator takes the source image and the warped motion features as input to generate a high-quality output with fine-grained details.
no code implementations • 1 Sep 2022 • Sindhu B Hegde, K R Prajwal, Rudrabha Mukhopadhyay, Vinay P Namboodiri, C. V. Jawahar
With the help of multiple powerful discriminators that guide the training process, our generator learns to synthesize speech sequences in any voice for the lip movements of any person.
no code implementations • 21 Aug 2022 • Aditya Agarwal, Bipasha Sen, Rudrabha Mukhopadhyay, Vinay Namboodiri, C. V Jawahar
Because of the manual pipeline, such platforms are also limited in vocabulary, supported languages, accents, and speakers and have a high usage cost.
1 code implementation • 21 Aug 2022 • Aditya Agarwal, Bipasha Sen, Rudrabha Mukhopadhyay, Vinay Namboodiri, C. V. Jawahar
To tackle this challenge, we introduce video-to-video (V2V) face-swapping, a novel task of face-swapping that can preserve (1) the identity and expressions of the source (actor) face video and (2) the background and pose of the target (double) video.
1 code implementation • 17 Aug 2022 • Sindhu B Hegde, Rudrabha Mukhopadhyay, Vinay P Namboodiri, C. V. Jawahar
We show that when we process this $8\times8$ video with the right set of audio and image priors, we can obtain a full-length, $256\times256$ video.
no code implementations • 2 Nov 2021 • Bipasha Sen, Aditya Agarwal, Rudrabha Mukhopadhyay, Vinay Namboodiri, C V Jawahar
Apart from evaluating our approach on the ALS patient, we also extend it to people with hearing impairment relying extensively on lip movements to communicate.
no code implementations • 16 Oct 2021 • Anchit Gupta, Faizan Farooq Khan, Rudrabha Mukhopadhyay, Vinay P. Namboodiri, C. V. Jawahar
Our evaluations show a clear improvement in the efficiency of using human editors and an improved video generation quality.
1 code implementation • 24 Jun 2021 • Parul Kapoor, Rudrabha Mukhopadhyay, Sindhu B Hegde, Vinay Namboodiri, C V Jawahar
Since the current datasets are inadequate for generating sign language directly from speech, we collect and release the first Indian sign language dataset comprising speech-level annotations, text transcripts, and the corresponding sign-language videos.
1 code implementation • 20 Dec 2020 • Sindhu B Hegde, K R Prajwal, Rudrabha Mukhopadhyay, Vinay Namboodiri, C. V. Jawahar
In this work, we re-think the task of speech enhancement in unconstrained real-world environments.
Ranked #1 on Speech Denoising on LRS3+VGGSound
4 code implementations • 23 Aug 2020 • K R Prajwal, Rudrabha Mukhopadhyay, Vinay Namboodiri, C. V. Jawahar
However, they fail to accurately morph the lip movements of arbitrary identities in dynamic, unconstrained talking face videos, resulting in significant parts of the video being out-of-sync with the new audio.
Ranked #1 on Unconstrained Lip-synchronization on LRS3 (using extra training data)
1 code implementation • CVPR 2020 • K R Prajwal, Rudrabha Mukhopadhyay, Vinay Namboodiri, C. V. Jawahar
In this work, we explore the task of lip to speech synthesis, i. e., learning to generate natural speech given only the lip movements of a speaker.
Ranked #1 on Lip Reading on LRW
no code implementations • LREC 2020 • Nimisha Srivastava, Rudrabha Mukhopadhyay, Prajwal K R, C. V. Jawahar
We believe that one of the major reasons for this is the lack of large, publicly available text-to-speech corpora in these languages that are suitable for training neural text-to-speech systems.
1 code implementation • ACM Multimedia, 2019 2019 • Prajwal K R, Rudrabha Mukhopadhyay, Jerin Philip, Abhishek Jha, Vinay Namboodiri, C. V. Jawahar
As today's digital communication becomes increasingly visual, we argue that there is a need for systems that can automatically translate a video of a person speaking in language A into a target language B with realistic lip synchronization.
Ranked #1 on Talking Face Generation on LRW (using extra training data)