Structured Pruning Meets Orthogonality

29 Sep 2021 · Huan Wang, Yun Fu ·

Several recent works empirically found finetuning learning rate is crucial to the final performance in structured neural network pruning. It is shown that the \emph{dynamical isometry} broken by pruning answers for this phenomenon. How to develop a filter pruning method that maintains or recovers dynamical isometry \emph{and} is scalable to modern deep networks remains elusive up to now. In this paper, we present \emph{orthogonality preserving pruning} (OPP), a regularization-based structured pruning method that maintains the dynamical isometry during pruning. Specifically, OPP regularizes the gram matrix of convolutional kernels to encourage kernel orthogonality among the important filters meanwhile driving the unimportant weights towards zero. We also propose to regularize batch-normalization parameters for better preserving dynamical isometry for the whole network. Empirically, OPP can compete with the \emph{ideal} dynamical isometry recovery method on linear networks. On non-linear networks (ResNet56/VGG19, CIFAR datasets), it outperforms the available solutions \emph{by a large margin}. Moreover, OPP can also work effectively with modern deep networks (ResNets) on ImageNet, delivering encouraging performance in comparison to many recent filter pruning methods. To our best knowledge, this is the \emph{first} method that effectively maintains dynamical isometry during pruning for \emph{large-scale} deep neural networks.

PDF Abstract