no code implementations • 24 Nov 2023 • Seonghak Kim, Gyeongdo Ham, SuIn Lee, Donggon Jang, Daeshik Kim
To distill optimal knowledge by adjusting non-target class predictions, we apply a higher temperature to low energy samples to create smoother distributions and a lower temperature to high energy samples to achieve sharper distributions.
no code implementations • 24 Nov 2023 • Gyeongdo Ham, Seonghak Kim, SuIn Lee, Jae-Hyeok Lee, Daeshik Kim
Furthermore, we propose a method called cosine similarity weighted temperature (CSWT) to improve the performance.
no code implementations • 23 Nov 2023 • Seonghak Kim, Gyeongdo Ham, Yucheol Cho, Daeshik Kim
The improvement in the performance of efficient and lightweight models (i. e., the student model) is achieved through knowledge distillation (KD), which involves transferring knowledge from more complex models (i. e., the teacher model).