Effective Training of Sparse Neural Networks under Global Sparsity Constraint

1 Jan 2021  ·  Xiao Zhou, Weizhong Zhang, Tong Zhang ·

Weight pruning is an effective technique to reduce the model size and inference time for deep neural networks in real world deployments. Although in earlier approaches, pruning was performed after the training of dense networks, in recent years, it was shown that the direct training of sparse neural networks is more effective. However, the performances of these two kinds of methods are limited due to a common challenge. That is, they always need to tune the pruning rate individually for each layer in a network either manually or by using handcrafted heuristic rules, since the magnitudes of the weight importance under their proposed criteria are quite different across layers. This paper proposes an effective method called ${\it probabilistic~masking}$ (ProbMask) to train sparse neural networks from scratch, where we develop a natural formulation under global sparsity constraint. The key idea is that we notice probability can be used as a global criterion for all layers to measure the weight importance. Therefore, we can encode the weight sparsity into an explicit global constraint on probabilities, leading to a constrained continuous minimization problem over the probability space. An appealing feature of ProbMask is that the amounts of weight redundancy can be learned automatically via our constraint and thus we avoid the problem of tuning pruning rates individually for different layers in a network. Extensive experimental results demonstrate that our method is much more effective than state-of-the-art methods and can outperform them by a significant margin under high pruning rates.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods