Weight-space symmetry in neural network loss landscapes revisited

25 Sep 2019  ·  Berfin Simsek, Johanni Brea, Bernd Illing, Wulfram Gerstner ·

Neural network training depends on the structure of the underlying loss landscape, i.e. local minima, saddle points, flat plateaus, and loss barriers. In relation to the structure of the landscape, we study the permutation symmetry of neurons in each layer of a deep neural network, which gives rise not only to multiple equivalent global minima of the loss function but also to critical points in between partner minima. In a network of $d-1$ hidden layers with $n_k$ neurons in layers $k = 1, \ldots, d$, we construct continuous paths between equivalent global minima that lead through a `permutation point' where the input and output weight vectors of two neurons in the same hidden layer $k$ collide and interchange. We show that such permutation points are critical points which lie inside high-dimensional subspaces of equal loss, contributing to the global flatness of the landscape. We also find that a permutation point for the exchange of neurons $i$ and $j$ transits into a flat high-dimensional plateau that enables all $n_k!$ permutations of neurons in a given layer $k$ at the same loss value. Moreover, we introduce higher-order permutation points by exploiting the hierarchical structure in the loss landscapes of neural networks, and find that the number of $K$-th order permutation points is much larger than the (already huge) number of equivalent global minima -- at least by a polynomial factor of order $K$. In two tasks, we demonstrate numerically with our path finding method that continuous paths between partner minima exist: first, in a toy network with a single hidden layer on a function approximation task and, second, in a multilayer network on the MNIST task. Our geometric approach yields a lower bound on the number of critical points generated by weight-space symmetries and provides a simple intuitive link between previous theoretical results and numerical observations.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here