Talking Head Generation
40 papers with code • 7 benchmarks • 3 datasets
Talking head generation is the task of generating a talking face from a set of images of a person.
( Image credit: Few-Shot Adversarial Learning of Realistic Neural Talking Head Models )
Latest papers with no code
RADIO: Reference-Agnostic Dubbing Video Synthesis
One of the most challenging problems in audio-driven talking head generation is achieving high-fidelity detail while ensuring precise synchronization.
From Pixels to Portraits: A Comprehensive Survey of Talking Head Generation Techniques and Applications
This paper presents a comprehensive survey of state-of-the-art methods for talking head generation.
Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline
In dyadic speaker-listener interactions, the listener's head reactions along with the speaker's head movements, constitute an important non-verbal semantic expression together.
Interactive Conversational Head Generation
Based on ViCo and ViCo-X, we define three novel tasks targeting the interaction modeling during the face-to-face conversation: 1) responsive listening head generation making listeners respond actively to the speaker with non-verbal signals, 2) expressive talking head generation guiding speakers to be aware of listeners' behaviors, and 3) conversational head generation to integrate the talking/listening ability in one interlocutor.
Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks
Given an audio clip and a reference face image, the goal of the talking head generation is to generate a high-fidelity talking head video.
High-Fidelity and Freely Controllable Talking Head Video Generation
Our method leverages both self-supervised learned landmarks and 3D face model-based landmarks to model the motion.
One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field
Talking head generation aims to generate faces that maintain the identity information of the source image and imitate the motion of the driving image.
TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles
In this work, we propose an expression-controllable one-shot talking head method, dubbed TalkCLIP, where the expression in a speech is specified by the natural language.
FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions
Specifically, the head pose prediction module is designed to generate head pose sequences from the source face and driving audio.
OPT: One-shot Pose-Controllable Talking Head Generation
To solve the identity mismatch problem and achieve high-quality free pose control, we present One-shot Pose-controllable Talking head generation network (OPT).