|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
We present VideoGen: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos.
Learning disentangled representations leads to interpretable models and facilitates data generation with style transfer, which has been extensively studied on static data such as images in an unsupervised learning framework.
Predicting the future frames of a video is a challenging task, in part due to the underlying stochastic real-world phenomena.
Our approach, Diverse Video Generator, uses a GP to learn priors on future states given the past and maintains a probability distribution over possible futures given a particular sample.
We introduce a motion generator that discovers the desired trajectory, in which content and motion are disentangled.
Recent works in self-supervised video prediction have mainly focused on passive forecasting and low-level action-conditional prediction, which sidesteps the problem of semantic learning.
Indeed, just having the ability to generate a single talking face would make a system almost robotic in nature.
To be truly understandable and accepted by Deaf communities, an automatic Sign Language Production (SLP) system must generate a photo-realistic signer.
Generative Adversarial Networks have recently shown promise for video generation, building off of the success of image generation while also addressing a new challenge: time.