Understanding Over-parameterization in Generative Adversarial Networks

A broad class of unsupervised deep learning methods such as Generative Adversarial Networks (GANs) involve training of over-parameterized models where the parameters of the model exceed the size of the training data set. Indeed, most successful GANs used in practice are trained using over-parameterized generator and discriminator networks, both in terms of depth and width. A large body of work in supervised learning have shown the importance of such model over-parameterization in the convergence of the gradient descent (GD) to globally optimal solutions. In contrast, the unsupervised setting and GANs in particular involve non-convex concave mini-max optimization problems that are often trained using alternating Gradient Descent/Ascent (GDA). The role and benefits of model over-parameterization in the convergence of GDA to a global saddle point in non-convex concave problems is far less understood. In this work, we present a comprehensive analysis of the importance of model over-parameterization in GANs both theoretically and empirically. We theoretically show that in an over-parameterized GAN model with a $1$-layer neural network generator and a linear discriminator, GDA converges to a global saddle point of the underlying non-convex concave min-max problem. To the best of our knowledge, this is the first result for global convergence of GDA in such settings. Our theory is based on a more general result that holds for a broader class of nonlinear generators and discriminators that obey certain assumptions (including deeper generators and random feature discriminators). Our theory utilizes and builds upon a novel connection with the convergence analysis of linear time-varying dynamical systems which may have broader implications for understanding the convergence behavior of GDA for non-convex concave problems involving over-parameterized models. We also empirically study the role of model over-parameterization in GANs using several large-scale experiments on CIFAR-10 and Celeb-A datasets. Our experiments show that over-parameterization improves the quality of generated samples across various model architectures and datasets. Remarkably, we observe that over-parameterization leads to faster and more stable convergence behavior of GDA across the board.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here