Rectifying the Data Bias in Knowledge Distillation

Knowledge distillation is a representative technique for model compression and acceleration, which is important for deploying neural networks on resource limited devices. The knowledge transferred from teacher to student is the mapping of teacher model, or represented by all the input-output pairs. However, in practice the student model only learns from data pairs of the dataset that may be biased, and we think this limits the performance of knowledge distillation. In this paper, we first quantitatively define the uniformity of the sampled data for training, providing a unified view for methods that learn from biased data. Then we evaluate the uniformity on real world dataset and show that existing methods actually improve the uniformity of data. We further introduce two uniformity-oriented methods for rectifying the bias of data for knowledge distillation. Extensive experiments conducted on Face Recognition and Person Re-identification have shown the effectiveness of our method. Moreover, we analyze the sampled data on Face Recognition and show that better balance is achieved between races and between easy and hard samples. And this effect can be also confirmed in training the student model from scratch, resulting in a comparable performance with standard knowledge distillation.

PDF Abstract

Results from the Paper


 Ranked #1 on Face Verification on IJB-C (training dataset metric)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Face Verification IJB-C L2E+IS-sampling TAR @ FAR=1e-3 97.05% # 5
TAR @ FAR=1e-4 95.49% # 17
TAR @ FAR=1e-5 93.25% # 9
training dataset MS1M V3 # 1
model MobileFaceNet # 1

Methods


No methods listed for this paper. Add relevant methods here