Emergence of Implicit Filter Sparsity in Convolutional Neural Networks
We show implicit filter level sparsity manifests in convolutional neural networks (CNNs) which employ Batch Normalization and ReLU activation, and are trained using adaptive gradient descent techniques with L2 regularization or weight decay. Through an extensive empirical study (Anonymous, 2019) we hypothesize the mechanism be hind the sparsification process. We find that the interplay of various phenomena influences the strength of L2 and weight decay regularizers, leading the supposedly non sparsity inducing regularizers to induce filter sparsity. In this workshop article we summarize some of our key findings and experiments, and present additional results on modern network architectures such as ResNet-50.
PDF Abstract