Talking face generation aims to synthesize a sequence of face images that correspond to given speech semantics.
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech.
To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers.
We devise a cascade GAN approach to generate talking face video, which is robust to different face shapes, view angles, facial characteristics, and noisy audio conditions.
In light of the recent breakthroughs in automatic machine translation systems, we propose a novel approach that we term as "Face-to-Face Translation".
SOTA for Talking Face Generation on LRW (using extra training data)