To achieve this, we decouple appearance and motion information using a self-supervised formulation.
Ranked #1 on
Video Reconstruction
on Tai-Chi-HD
We show that our framework can spatially transform the inputs in an efficient manner.
This is achieved through a deep architecture that decouples appearance and motion information.
Conditioned on the source image, the transformed mask is then decoded by a multi-scale generator that renders a realistic image, in which the content of the source frame is animated by the pose in the driving video.
In this work we propose a novel deep learning approach for ultra-low bitrate video compression for video conferencing applications.