no code implementations • 28 Feb 2024 • Meidai Xuanyuan, Yuwang Wang, Honglei Guo, Qionghai Dai
To achieve this, we provide a two-stage and cross-modal controllable video generation pipeline, taking facial landmarks as an explicit and compact control signal to bridge the driving audio, talking context and generated videos.
no code implementations • 9 Apr 2023 • Meidai Xuanyuan, Yuwang Wang, Honglei Guo, Xiao Ma, Yuchen Guo, Tao Yu, Qionghai Dai
To support this novel task, we further collect a character centric multimodal dialogue dataset, named Deep Personalized Character Dataset (DPCD), from TV shows.