On Residual Networks Learning a Perturbation from Identity

11 Feb 2019  ·  Michael Hauser ·

The purpose of this work is to test and study the hypothesis that residual networks are learning a perturbation from identity. Residual networks are enormously important deep learning models, with many theories attempting to explain how they function; learning a perturbation from identity is one such theory. In order to answer this question, the magnitudes of the perturbations are measured in both an absolute sense as well as in a scaled sense, with each form having its relative benefits and drawbacks. Additionally, a stopping rule is developed that can be used to decide the depth of the residual network based on the average perturbation magnitude being less than a given epsilon. With this analysis a better understanding of how residual networks process and transform data from input to output is formed. Parallel experiments are conducted on MNIST as well as CIFAR10 for various sized residual networks with between 6 and 300 residual blocks. It is found that, in this setting, the average scaled perturbation magnitude is roughly inversely proportional to increasing the number of residual blocks, and from this it follows that for sufficiently large residual networks, they are learning a perturbation from identity.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here