1 code implementation • NeurIPS 2023 • Minsoo Kim, Sihwa Lee, Janghwan Lee, Sukjin Hong, Du-Seong Chang, Wonyong Sung, Jungwook Choi
Generative Language Models (GLMs) have shown impressive performance in tasks such as text generation, understanding, and reasoning.
1 code implementation • 20 Nov 2022 • Minsoo Kim, Sihwa Lee, Sukjin Hong, Du-Seong Chang, Jungwook Choi
In particular, KD has been employed in quantization-aware training (QAT) of Transformer encoders like BERT to improve the accuracy of the student model with the reduced-precision weight parameters.
no code implementations • 3 Dec 2021 • Joonsang Yu, Junki Park, Seongmin Park, Minsoo Kim, Sihwa Lee, Dong Hyun Lee, Jungwook Choi
Non-linear operations such as GELU, Layer normalization, and Softmax are essential yet costly building blocks of Transformer models.