Search Results for author: Sukjin Hong

Found 3 papers, 3 papers with code

Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective

1 code implementation3 Feb 2023 Jongwoo Ko, Seungjoon Park, Minchan Jeong, Sukjin Hong, Euijai Ahn, Du-Seong Chang, Se-Young Yun

Knowledge distillation (KD) is a highly promising method for mitigating the computational problems of pre-trained language models (PLMs).

Knowledge Distillation

Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders

1 code implementation20 Nov 2022 Minsoo Kim, Sihwa Lee, Sukjin Hong, Du-Seong Chang, Jungwook Choi

In particular, KD has been employed in quantization-aware training (QAT) of Transformer encoders like BERT to improve the accuracy of the student model with the reduced-precision weight parameters.

Knowledge Distillation Model Compression +1

Cannot find the paper you are looking for? You can Submit a new open access paper.