Generative Models

A LAPGAN, or Laplacian Generative Adversarial Network, is a type of generative adversarial network that has a Laplacian pyramid representation. In the sampling procedure following training, we have a set of generative convnet models {$G_{0}, \dots , G_{K}$}, each of which captures the distribution of coefficients $h_{k}$ for natural images at a different level of the Laplacian pyramid. Sampling an image is akin to a reconstruction procedure, except that the generative models are used to produce the $h_{k}$’s:

$$ \tilde{I}_{k} = u\left(\tilde{I}_{k+1}\right) + \tilde{h}_{k} = u\left(\tilde{I}_{k+1}\right) + G_{k}\left(z_{k}, u\left(\tilde{I}_{k+1}\right)\right)$$

The recurrence starts by setting $\tilde{I}_{K+1} = 0$ and using the model at the final level $G_{K}$ to generate a residual image $\tilde{I}_{K}$ using noise vector $z_{K}$: $\tilde{I}_{K} = G_{K}\left(z_{K}\right)$. Models at all levels except the final are conditional generative models that take an upsampled version of the current image $\tilde{I}_{k+1}$ as a conditioning variable, in addition to the noise vector $z_{k}$.

The generative models {$G_{0}, \dots, G_{K}$} are trained using the CGAN approach at each level of the pyramid. Specifically, we construct a Laplacian pyramid from each training image $I$. At each level we make a stochastic choice (with equal probability) to either (i) construct the coefficients $h_{k}$ either using the standard Laplacian pyramid coefficient generation procedure or (ii) generate them using $G_{k}:

$$ \tilde{h}_{k} = G_{k}\left(z_{k}, u\left(I_{k+1}\right)\right) $$

Here $G_{k}$ is a convnet which uses a coarse scale version of the image $l_{k} = u\left(I_{k+1}\right)$ as an input, as well as noise vector $z_{k}$. $D_{k}$ takes as input $h_{k}$ or $\tilde{h}_{k}$, along with the low-pass image $l_{k}$ (which is explicitly added to $h_{k}$ or $\tilde{h}_{k}$ before the first convolution layer), and predicts if the image was real or generated. At the final scale of the pyramid, the low frequency residual is sufficiently small that it can be directly modeled with a standard GAN: $\tilde{h}_{K} = G_{K}\left(z_{K}\right)$ and $D_{K}$ only has $h_{K}$ or $\tilde{h}_{K}$ as input.

Breaking the generation into successive refinements is the key idea. We give up any “global” notion of fidelity; an attempt is never made to train a network to discriminate between the output of a cascade and a real image and instead the focus is on making each step plausible.

Source: Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

Papers


Paper Code Results Date Stars

Categories