no code implementations • 2 Apr 2025 • Yuxuan Luo, Zhengkun Rong, Lizhen Wang, Longhao Zhang, Tianshu Hu, Yongming Zhu
For motion guidance, our hybrid control signals that integrate implicit facial representations, 3D head spheres, and 3D body skeletons achieve robust control of facial expressions and body movements, while producing expressive and identity-preserving animations.
no code implementations • 5 Dec 2024 • Yongming Zhu, Longhao Zhang, Zhengkun Rong, Tianshu Hu, Shuang Liang, Zhipeng Ge
The second stage learns the mapping from the input dyadic audio to motion latent codes through denoising, leading to the audio-driven head generation in interactive scenarios.
no code implementations • 9 Sep 2024 • Longhao Zhang, Shuang Liang, Zhipeng Ge, Tianshu Hu
In this paper, we present PersonaTalk, an attention-based two-stage framework, including geometry construction and face rendering, for high-fidelity and personalized visual dubbing.
1 code implementation • 12 Dec 2023 • Kangneng Zhou, Daiheng Gao, Xuan Wang, Jie Zhang, Peng Zhang, Xusen Sun, Longhao Zhang, Shiqi Yang, Bang Zhang, Liefeng Bo, Yaxing Wang, Ming-Ming Cheng
This enhances masked-based editing in local areas; second, we present a novel distillation strategy: Conditional Distillation on Geometry and Texture (CDGT).
no code implementations • 4 Dec 2023 • Xusen Sun, Longhao Zhang, Hao Zhu, Peng Zhang, Bang Zhang, Xinya Ji, Kangneng Zhou, Daiheng Gao, Liefeng Bo, Xun Cao
Audio-driven talking head generation has drawn much attention in recent years, and many efforts have been made in lip-sync, expressive facial expressions, natural head pose generation, and high video quality.
no code implementations • CVPR 2023 • Weichuang Li, Longhao Zhang, Dong Wang, Bin Zhao, Zhigang Wang, Mulin Chen, Bang Zhang, Zhongjian Wang, Liefeng Bo, Xuelong Li
Talking head generation aims to generate faces that maintain the identity information of the source image and imitate the motion of the driving image.
1 code implementation • CVPR 2022 • Fa-Ting Hong, Longhao Zhang, Li Shen, Dan Xu
In a more dense way, the depth is also utilized to learn 3D-aware cross-modal (i. e. appearance and depth) attention to guide the generation of motion fields for warping source image representations.