Generative Video Models

TrIVD-GAN, or Transformation-based & TrIple Video Discriminator GAN, is a type of generative adversarial network for video generation that builds upon DVD-GAN. Improvements include a novel transformation-based recurrent unit (the TSRU) that makes the generator more expressive, and an improved discriminator architecture.

In contrast with DVD-GAN, TrIVD-GAN has an alternative split for the roles of the discriminators, with $\mathcal{D}_{S}$ judging per-frame global structure, while $\mathcal{D}_{T}$ critiques local spatiotemporal structure. This is achieved by downsampling the $k$ randomly sampled frames fed to $\mathcal{D}_{S}$ by a factor $s$, and cropping $T \times H/s \times W/s$ clips inside the high resolution video fed to $\mathcal{D}_{T}$, where $T, H, W, C$ correspond to time, height, width and channel dimension of the input. This further reduces the number of pixels to process per video, from $k \times H \times W + T \times H/s \times W/s$ to $\left(k + T\right) \times H/s \times W/s$.

Source: Transformation-based Adversarial Video Prediction on Large-Scale Data


Paper Code Results Date Stars


Task Papers Share
Video Generation 1 50.00%
Video Prediction 1 50.00%