Existing deep interactive colorization models have focused on ways to utilize various types of interactions, such as point-wise color hints, scribbles, or natural-language texts, as methods to reflect a user's intent at runtime.
In order to perform unconditional video generation, we must learn the distribution of the real-world videos.
We present a novel Animation CelebHeads dataset (AnimeCeleb) to address an animation head reenactment.
Learning visual representations using large-scale unlabelled images is a holy grail for most of computer vision tasks.
Video generation models often operate under the assumption of fixed frame rates, which leads to suboptimal performance when it comes to handling flexible frame rates (e. g., increasing the frame rate of the more dynamic portion of the video as well as handling missing video frames).
However, it is difficult to prepare for a training data set that has a sufficient amount of semantically meaningful pairs of images as well as the ground truth for a colored image reflecting a given reference (e. g., coloring a sketch of an originally blue car given a reference green car).
Despite recent advancements in deep learning-based automatic colorization, they are still limited when it comes to few-shot learning.