Understanding Diversity Based Neural Network Pruning in Teacher Student Setup

ICLR Workshop Neural_Compression 2021 · Rupam Acharyya, Ankani Chattoraj, Boyu Zhang, Shouman Das, Daniel Stefankovic ·

Despite multitude of empirical advances, there is a lack of theoretical understanding of the effectiveness of different pruning methods. We inspect different pruning techniques under the statistical mechanics formulation of a teacher-student framework and derive their generalization error (GE) bounds. In the first part, we theoretically prove empirical observations of a recent work that showed Determinantal Point Process (DPP) based node pruning method is notably superior to competing approaches when tested on real datasets. In the second part, we use our theoretical setup to prove that the baseline random edge pruning method performs better than the DPP node pruning method, consistent with the finding in literature that sparse neural networks (edge pruned) generalize better than dense neural networks (node pruned) for a fixed number of parameters.

PDF Abstract