Current Knowledge Distillation (KD) methods for semantic segmentation often guide the student to mimic the teacher's structured information generated from individual data samples.
In particular, audio and visual front-ends are trained on large-scale unimodal datasets, then we integrate components of both front-ends into a larger multimodal framework which learns to recognize parallel audio-visual data into characters through a combination of CTC and seq2seq decoding.
The outputs from the teacher network are used as soft labels for supervising the training of a new network.
Ranked #7 on Knowledge Distillation on ImageNet
In this paper, we investigate the bias-variance tradeoff brought by distillation with soft labels.
In this paper, we propose a novel network design mechanism for efficient embedded computing.
Ranked #4 on Face Verification on CFP-FP