1 code implementation • 19 May 2021 • Taehyeon Kim, Jaehoon Oh, Nakyil Kim, Sangwook Cho, Se-Young Yun
From this observation, we consider an intuitive KD loss function, the mean squared error (MSE) between the logit vectors, so that the student model can directly learn the logit of the teacher model.
no code implementations • 1 Jan 2021 • Taehyeon Kim, Jaehoon Oh, Nakyil Kim, Sangwook Cho, Se-Young Yun
To verify this conjecture, we test an extreme logit learning model, where the KD is implemented with Mean Squared Error (MSE) between the student's logit and the teacher's logit.
no code implementations • 7 Dec 2020 • Taehyeon Kim, Jaeyeon Ahn, Nakyil Kim, Seyoung Yun
In the machine learning algorithms, the choice of the hyperparameter is often an art more than a science, requiring labor-intensive search with expert experience.