Talking Face Generation
20 papers with code • 1 benchmarks • 4 datasets
Talking face generation aims to synthesize a sequence of face images that correspond to given speech semantics.
However, they fail to accurately morph the lip movements of arbitrary identities in dynamic, unconstrained talking face videos, resulting in significant parts of the video being out-of-sync with the new audio.
Network failures continue to plague datacenter operators as their symptoms may not have direct correlation with where or why they occur.
Given an arbitrary face image and an arbitrary speech clip, the proposed work attempts to generating the talking face video with accurate lip synchronization while maintaining smooth transition of both lip and facial movement over the entire video clip.
Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech.
To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers.
We devise a cascade GAN approach to generate talking face video, which is robust to different face shapes, view angles, facial characteristics, and noisy audio conditions.
Neural Voice Puppetry has a variety of use-cases, including audio-driven video avatars, video dubbing, and text-driven video synthesis of a talking head.