Distilling Knowledge via Knowledge Review

Knowledge distillation transfers knowledge from the teacher network to the student one, with the goal of greatly improving the performance of the student network. Previous methods mostly focus on proposing feature transformation and loss functions between the same level's features to improve the effectiveness. We differently study the factor of connection path cross levels between teacher and student networks, and reveal its great importance. For the first time in knowledge distillation, cross-stage connection paths are proposed. Our new review mechanism is effective and structurally simple. Our finally designed nested and compact framework requires negligible computation overhead, and outperforms other methods on a variety of tasks. We apply our method to classification, object detection, and instance segmentation tasks. All of them witness significant student network performance improvement. Code is available at https://github.com/Jia-Research-Lab/ReviewKD

PDF Abstract CVPR 2021 PDF CVPR 2021 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Knowledge Distillation CIFAR-100 resnet8x4 (T: resnet32x4 S: resnet8x4) Top-1 Accuracy (%) 75.63 # 12
Knowledge Distillation CIFAR-100 vgg8 (T:vgg13 S:vgg8) Top-1 Accuracy (%) 74.84 # 15
Knowledge Distillation ImageNet Knowledge Review (T: ResNet-34 S:ResNet-18) Top-1 accuracy % 71.61 # 33
CRD training setting # 1

Methods


No methods listed for this paper. Add relevant methods here