Search Results for author: Sihwa Lee

Found 3 papers, 2 papers with code

Token-Scaled Logit Distillation for Ternary Weight Generative Language Models

1 code implementation • NeurIPS 2023 • Minsoo Kim, Sihwa Lee, Janghwan Lee, Sukjin Hong, Du-Seong Chang, Wonyong Sung, Jungwook Choi

Generative Language Models (GLMs) have shown impressive performance in tasks such as text generation, understanding, and reasoning.

Arithmetic Reasoning Common Sense Reasoning +4

Paper
Code

Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders

1 code implementation • 20 Nov 2022 • Minsoo Kim, Sihwa Lee, Sukjin Hong, Du-Seong Chang, Jungwook Choi

In particular, KD has been employed in quantization-aware training (QAT) of Transformer encoders like BERT to improve the accuracy of the student model with the reduced-precision weight parameters.

Knowledge Distillation Model Compression +1

Paper
Code

NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference

no code implementations • 3 Dec 2021 • Joonsang Yu, Junki Park, Seongmin Park, Minsoo Kim, Sihwa Lee, Dong Hyun Lee, Jungwook Choi

Non-linear operations such as GELU, Layer normalization, and Softmax are essential yet costly building blocks of Transformer models.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.