Talking Face Generation
40 papers with code • 2 benchmarks • 6 datasets
Talking face generation aims to synthesize a sequence of face images that correspond to given speech semantics
( Image credit: Talking Face Generation by Adversarially Disentangled Audio-Visual Representation )
Most implemented papers
A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild
However, they fail to accurately morph the lip movements of arbitrary identities in dynamic, unconstrained talking face videos, resulting in significant parts of the video being out-of-sync with the new audio.
MakeItTalk: Speaker-Aware Talking-Head Animation
We present a method that generates expressive talking heads from a single facial image with audio as the only input.
Talking Face Generation by Conditional Recurrent Adversarial Network
Given an arbitrary face image and an arbitrary speech clip, the proposed work attempts to generating the talking face video with accurate lip synchronization while maintaining smooth transition of both lip and facial movement over the entire video clip.
Talking Face Generation by Adversarially Disentangled Audio-Visual Representation
Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech.
ReenactGAN: Learning to Reenact Faces via Boundary Transfer
A transformer is subsequently used to adapt the boundary of source face to the boundary of target face.
Capture, Learning, and Synthesis of 3D Speaking Styles
To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers.
Neural Voice Puppetry: Audio-driven Facial Reenactment
Neural Voice Puppetry has a variety of use-cases, including audio-driven video avatars, video dubbing, and text-driven video synthesis of a talking head.
Speech Driven Talking Face Generation from a Single Image and an Emotion Condition
Visual emotion expression plays an important role in audiovisual speech communication.
Stochastic Talking Face Generation Using Latent Distribution Matching
Indeed, just having the ability to generate a single talking face would make a system almost robotic in nature.
AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis
Generating high-fidelity talking head video by fitting with the input audio sequence is a challenging problem that receives considerable attentions recently.