Learning Energy-Based Generative Models via Coarse-to-Fine Expanding and Sampling

ICLR 2021 · Yang Zhao, Jianwen Xie, Ping Li ·

Energy-based models (EBMs) for generative modeling parametrize a single net and can be directly trained by maximum likelihood estimation. Despite the simplicity and tractability, current approaches are either unstable to learn or difficult to synthesize diverse and high-fidelity images. We propose to train EBM via a multistage coarse-to-fine expanding and sampling strategy, namely CF-EBM. For the purpose of improving the learning procedure, we construct an effective net architecture and advocate applying smooth activations. The resulting approach is shown to be computationally efficient and achieves the best performance on image generation amongst EBMs and the spectral normalization GAN. Furthermore, we provide a recipe for being the first successful EBM to synthesize $256\times256$-pixel images. In the end, we effortlessly generalize CF-EBM to the one-sided unsupervised image-to-image translation and beat baseline methods with the model size reduced by $1000\times$ and the training budget by $9\times$. In parallel, we present a gradient-based discriminative saliency method to explicitly interpret the translation dynamics which align with human behavior.

PDF Abstract