1 code implementation • 18 Nov 2019 • Tiancheng Wen, Shenqi Lai, Xueming Qian
Knowledge distillation (KD) is widely used for training a compact model with the supervision of another large model, which could effectively improve the performance.
Knowledge Distillation