On the relationship between topology and gradient propagation in deep networks
In this paper, we address two fundamental research questions in neural architecture design: (i) How does the architecture topology impact the gradient flow during training? (ii) Can certain topological characteristics of deep networks indicate a priori (i.e., without training) which models, with a different number of parameters/FLOPS/layers, achieve a similar accuracy? To this end, we formulate the problem of deep learning architecture design from a network science perspective and introduce a new metric called NN-Mass to quantify how effectively information flows through a given architecture. We establish a theoretical link between NN-Mass, a topological property of neural architectures, and gradient flow characteristics (e.g., Layerwise Dynamical Isometry). As such, NN-Mass can identify models with similar accuracy, despite having significantly different size/compute requirements. Detailed experiments on both synthetic and real datasets (e.g., MNIST, CIFAR-10, CIFAR-100, ImageNet) provide extensive evidence for our insights. Finally, we show that the closed-form equation of our theoretically grounded NN-Mass metric enables us to design efficient architectures directly without time-consuming training and searching.
PDF Abstract