Improved Knowledge Distillation via Teacher Assistant

9 Feb 2019Seyed-Iman MirzadehMehrdad FarajtabarAng LiNir LevineAkihiro MatsukawaHassan Ghasemzadeh

Despite the fact that deep neural networks are powerful models and achieve appealing results on many tasks, they are too large to be deployed on edge devices like smartphones or embedded sensor nodes. There have been efforts to compress these networks, and a popular method is knowledge distillation, where a large (teacher) pre-trained network is used to train a smaller (student) network... (read more)

PDF Abstract

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.