SPARK: co-exploring model SPArsity and low-RanKness for compact neural networks
Sparsification and low-rank decomposition are two important techniques for deep neural network (DNN) compression. To date, these two popular yet distinct approaches are typically used in a separate way; while their efficient integration for better compression performance is little explored. In this paper we perform systematic co-exploration on the model sparsity and low-rankness towards compact neural networks. We first investigate and analyze several important design factors for the joint pruning and low-rank factorization, including operational sequence, low-rank format, and optimization objective. Based on the observations and outcomes from our analysis, we then propose SPARK, a unified DNN compression framework that can simultaneously capture model SPArsity and low-RanKness in an efficient way. Empirical experiments demonstrate very promising performance of our proposed solution. Notably, on CIFAR-10 dataset, our approach can bring 1.25%, 1.02% and 0.16% accuracy increase over the baseline ResNet-20, ResNet-56 and DenseNet-40 models, respectively, and meanwhile the storage and computational costs are reduced by 70.4% and 71.1% (for ResNet-20), 37.5% and 39.3% (for ResNet-56) and 52.4% and 61.3% (for DenseNet-40), respectively. On ImageNet dataset, our approach can enable 0.52% accuracy increase over baseline model with 48.7% fewer parameters.
PDF Abstract