Self-Knowledge Distillation with Progressive Refinement of Targets

The generalization capability of deep neural networks has been substantially improved by applying a wide spectrum of regularization methods, e.g., restricting function space, injecting randomness during training, augmenting data, etc. In this work, we propose a simple yet effective regularization method named progressive self-knowledge distillation (PS-KD), which progressively distills a model's own knowledge to soften hard targets (i.e., one-hot vectors) during training. Hence, it can be interpreted within a framework of knowledge distillation as a student becomes a teacher itself. Specifically, targets are adjusted adaptively by combining the ground-truth and past predictions from the model itself. We show that PS-KD provides an effect of hard example mining by rescaling gradients according to difficulty in classifying examples. The proposed method is applicable to any supervised learning tasks with hard targets and can be easily combined with existing regularization methods to further enhance the generalization performance. Furthermore, it is confirmed that PS-KD achieves not only better accuracy, but also provides high quality of confidence estimates in terms of calibration as well as ordinal ranking. Extensive experimental results on three different tasks, image classification, object detection, and machine translation, demonstrate that our method consistently improves the performance of the state-of-the-art baselines. The code is available at

PDF Abstract ICCV 2021 PDF ICCV 2021 Abstract

Results from the Paper

 Ranked #1 on Multimodal Machine Translation on Multi30K (BLUE (DE-EN) metric)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Image Classification CIFAR-100 PyramidNet-200 + Shakedrop + Cutmix + PS-KD Percentage correct 86.41 # 52
Image Classification ImageNet PS-KD (ResNet-152 + CutMix) Top 1 Accuracy 79.24% # 685
Machine Translation IWSLT2015 English-German PS-KD BLEU score 30.00 # 1
Machine Translation IWSLT2015 German-English PS-KD BLEU score 36.20 # 1
Multimodal Machine Translation Multi30K PS-KD BLUE (DE-EN) 32.3 # 1
Object Detection PASCAL VOC 2007 PS-KD (ResNet-152, CutMix) MAP 79.7% # 11