A global analysis of global optimisation

10 Oct 2022  ·  Lachlan Ewen MacDonald, Hemanth Saratchandran, Jack Valmadre, Simon Lucey ·

We introduce a general theoretical framework, designed for the study of gradient optimisation of deep neural networks, that encompasses ubiquitous architectural choices including batch normalisation, weight normalisation and skip connections. We use our framework to conduct a global analysis of the curvature and regularity properties of neural network loss landscapes induced by normalisation layers and skip connections respectively. We then demonstrate the utility of this framework in two respects. First, we give the only proof of which we are presently aware that a class of deep neural networks can be trained using gradient descent to global optima even when such optima only exist at infinity, as is the case for the cross-entropy cost. Second, we verify a prediction made by the theory, that skip connections accelerate training, with ResNets on MNIST, CIFAR10, CIFAR100 and ImageNet.

PDF Abstract
No code implementations yet. Submit your code now



  Add Datasets introduced or used in this paper

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.