Search Results for author: Sukjin Hong

Found 3 papers, 3 papers with code

Token-Scaled Logit Distillation for Ternary Weight Generative Language Models

1 code implementation • NeurIPS 2023 • Minsoo Kim, Sihwa Lee, Janghwan Lee, Sukjin Hong, Du-Seong Chang, Wonyong Sung, Jungwook Choi

Generative Language Models (GLMs) have shown impressive performance in tasks such as text generation, understanding, and reasoning.

Arithmetic Reasoning Common Sense Reasoning +4

Paper
Code

Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective

1 code implementation • 3 Feb 2023 • Jongwoo Ko, Seungjoon Park, Minchan Jeong, Sukjin Hong, Euijai Ahn, Du-Seong Chang, Se-Young Yun

Knowledge distillation (KD) is a highly promising method for mitigating the computational problems of pre-trained language models (PLMs).

Knowledge Distillation

Paper
Code

Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders

1 code implementation • 20 Nov 2022 • Minsoo Kim, Sihwa Lee, Sukjin Hong, Du-Seong Chang, Jungwook Choi

In particular, KD has been employed in quantization-aware training (QAT) of Transformer encoders like BERT to improve the accuracy of the student model with the reduced-precision weight parameters.

Knowledge Distillation Model Compression +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.