1 code implementation • 19 Jul 2022 • Bytasandram Yaswanth Reddy, Shiv Ram Dubey, Rakesh Kumar Sanodiya, Ravi Ranjan Prasad Karn
Existing knowledge distillation methods use logits and other features of the deep (teacher) model and as knowledge for the compact (student) model, which requires the teacher's network to be fine-tuned on the context in parallel with the student model on the context.