no code implementations • 1 Dec 2016 • Jonathan Shen, Noranart Vesdapunt, Vishnu N. Boddeti, Kris M. Kitani
It has been observed that many of the parameters of a large network are redundant, allowing for the possibility of learning a smaller network that mimics the outputs of the large network through a process called Knowledge Distillation.