Are All Layers Created Equal?

6 Feb 2019  ·  Chiyuan Zhang, Samy Bengio, Yoram Singer ·

Understanding deep neural networks has been a major research objective in recent years with notable theoretical progress. A focal point of those studies stems from the success of excessively large networks which defy the classical wisdom of uniform convergence and learnability... We study empirically the layer-wise functional structure of overparameterized deep models. We provide evidence for the heterogeneous characteristic of layers. To do so, we introduce the notion of robustness to post-training re-initialization and re-randomization. We show that the layers can be categorized as either ``ambient'' or ``critical''. Resetting the ambient layers to their initial values has no negative consequence, and in many cases they barely change throughout training. On the contrary, resetting the critical layers completely destroys the predictor and the performance drops to chanceh. Our study provides further evidence that mere parameter counting or norm accounting is too coarse in studying generalization of deep models, and flatness or robustness analysis of the models needs to respect the network architectures. read more

PDF Abstract


  Add Datasets introduced or used in this paper

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.


No methods listed for this paper. Add relevant methods here