A partial theory of Wide Neural Networks using WC functions and its practical implications

29 Sep 2021  ·  Dario Balboni, Davide Bacciu ·

We present a framework based on the theory of Polyak-Łojasiewicz functions to explain the properties of convergence and generalization of overparameterized feed-forward neural networks. We introduce the class of Well-Conditioned (WC) reparameterizations, which are closed under composition and preserve the class of Polyak-Łojasiewicz functions, thus enabling compositionality of the framework results which can be studied separately for each layer and in an architecture-neutral way. We show that overparameterized neural layers are WC and can therefore be composed to build easily optimizable functions. We expose a pointwise stability bound implying that overparameterization in WC models leads to a tighter convergence around a global minimizer. Our framework allows to derive quantitative estimates for the terms that govern the optimization process of neural networks. We leverage this aspect to empirically evaluate the predictions set forth by some relevant published theories concerning conditioning, training speed, and generalization of the neural networks training process. Our contribution aims to encourage the development of mixed theoretical-practical approaches, where the properties postulated by the theory can also find empirical confirmation.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here