Talking Head Generation

40 papers with code • 7 benchmarks • 3 datasets

Talking head generation is the task of generating a talking face from a set of images of a person.

( Image credit: Few-Shot Adversarial Learning of Realistic Neural Talking Head Models )

Benchmarks

Add a Result

These leaderboards are used to track progress in Talking Head Generation

Dataset	Best Model	Compare
VoxCeleb2 - 1-shot learning	Fast Bi-layer Avatars (medium size)	See all
VoxCeleb1 - 1-shot learning	Few-shot Adversarial Model	See all
VoxCeleb1 - 8-shot learning	Few-shot Adversarial Model	See all
VoxCeleb1 - 32-shot learning	Few-shot Adversarial Model	See all
VoxCeleb2 - 8-shot learning	CainGAN	See all
VoxCeleb2 - 32-shot learning	Few-shot Adversarial Model	See all
100 sleep nights of 8 caregivers	Ashok	See all

Datasets

Subtasks

Unconstrained Lip-synchronization

Most implemented papers

Most implemented Social Latest No code

Few-Shot Adversarial Learning of Realistic Neural Talking Head Models

vincent-thevenin/Realistic-Neural-Talking-Head-Models • • ICCV 2019

In order to create a personalized talking head model, these works require training on a large dataset of images of a single person.

Paper
Code

A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild

Rudrabha/Wav2Lip • • 23 Aug 2020

However, they fail to accurately morph the lip movements of arbitrary identities in dynamic, unconstrained talking face videos, resulting in significant parts of the video being out-of-sync with the new audio.

Paper
Code

MakeItTalk: Speaker-Aware Talking-Head Animation

yzhou359/MakeItTalk • • 27 Apr 2020

We present a method that generates expressive talking heads from a single facial image with audio as the only input.

Paper
Code

ReenactGAN: Learning to Reenact Faces via Boundary Transfer

wywu/ReenactGAN • • ECCV 2018

A transformer is subsequently used to adapt the boundary of source face to the boundary of target face.

Paper
Code

Text-based Editing of Talking-head Video

sibozhang/Text2Video • 4 Jun 2019

To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material.

Paper
Code

Neural Voice Puppetry: Audio-driven Facial Reenactment

miu200521358/NeuralVoicePuppetryMMD • • ECCV 2020

Neural Voice Puppetry has a variety of use-cases, including audio-driven video avatars, video dubbing, and text-driven video synthesis of a talking head.

Paper
Code

0-Step Capturability, Motion Decomposition and Global Feedback Control of the 3D Variable Height-Inverted Pendulum

GabrielEGC/IHMC-Robotics • 12 Dec 2019

We also prove that the 3D VHIP with Fixed CoP is the same as its 2D version, and we generalize controllers working on the 2D VHIP to the 3D VHIP.

Paper
Code

What comprises a good talking-head video generation?: A Survey and Benchmark

lelechen63/talking-head-generation-survey • • 7 May 2020

In this work, we present a carefully-designed benchmark for evaluating talking-head video generation with standardized dataset pre-processing strategies.

Paper
Code