Understanding Diversity Based Neural Network Pruning in Teacher Student Setup

Despite multitude of empirical advances, there is a lack of theoretical understanding of the effectiveness of different pruning methods. We inspect different pruning techniques under the statistical mechanics formulation of a teacher-student framework and derive their generalization error (GE) bounds. In the first part, we theoretically prove empirical observations of a recent work that showed Determinantal Point Process (DPP) based node pruning method is notably superior to competing approaches when tested on real datasets. In the second part, we use our theoretical setup to prove that the baseline random edge pruning method performs better than the DPP node pruning method, consistent with the finding in literature that sparse neural networks (edge pruned) generalize better than dense neural networks (node pruned) for a fixed number of parameters.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods