Integral Pruning on Activations and Weights for Efficient Neural Networks

ICLR 2019 · Qing Yang, Wei Wen, Zuoguan Wang, Yiran Chen, Hai Li ·

With the rapidly scaling up of deep neural networks (DNNs), extensive research studies on network model compression such as weight pruning have been performed for efficient deployment. This work aims to advance the compression beyond the weights to the activations of DNNs. We propose the Integral Pruning (IP) technique which integrates the activation pruning with the weight pruning. Through the learning on the different importance of neuron responses and connections, the generated network, namely IPnet, balances the sparsity between activations and weights and therefore further improves execution efficiency. The feasibility and effectiveness of IPnet are thoroughly evaluated through various network models with different activation functions and on different datasets. With <0.5% disturbance on the testing accuracy, IPnet saves 71.1% ~ 96.35% of computation cost, compared to the original dense models with up to 5.8x and 10x reductions in activation and weight numbers, respectively.

PDF Abstract