no code implementations • 26 Feb 2021 • Reyhan Kevser Keser, Aydin Ayanzadeh, Omid Abdollahi Aghdam, Caglar Kilcioglu, Behcet Ugur Toreyin, Nazim Kemal Ure
One of the most efficient methods for model compression is hint distillation, where the student model is injected with information (hints) from several different layers of the teacher model.