Learning One-hidden-layer Neural Networks on Gaussian Mixture Models with Guaranteed Generalizability

1 Jan 2021  ·  Hongkang Li, Shuai Zhang, Meng Wang ·

We analyze the learning problem of fully connected neural networks with the sigmoid activation function for binary classification from the setup of model estimation. The outputs are assumed to be generated by a ground-truth neural network with the unknown parameters, and the learning objective is to estimate the ground-truth model parameters by minimizing a non-convex cross-entropy loss function of the training data. Instead of following the conventional and restrictive assumption in the literature that the input features follow the standard Gaussian distribution, this paper, for the first time, analyzes a more general and practical scenario that the input features follow a Gaussian mixture model of a finite number of Gaussian distributions of various mean and variance. We propose a gradient descent algorithm with a tensor initialization approach and show that our algorithm converges linearly to a critical point that has a diminishing distance to the ground-truth model with guaranteed generalizability. We characterize the required number of samples for successful convergence, referred to as the sample complexity, as a function of the parameters of the Gaussian mixture model. We prove analytically that when any mean or variance in the mixture model is large, or when all variances are close to zero, the sample complexity increases, and the convergence slows down, indicating a more challenging learning problem. Although focusing on one-hidden-layer neural networks, this paper provides the first theoretical analyses of the impact of the parameters of the input distributions on the learning performance.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods