Search Results for author: Sihwa Lee

Found 3 papers, 2 papers with code

Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders

1 code implementation20 Nov 2022 Minsoo Kim, Sihwa Lee, Sukjin Hong, Du-Seong Chang, Jungwook Choi

In particular, KD has been employed in quantization-aware training (QAT) of Transformer encoders like BERT to improve the accuracy of the student model with the reduced-precision weight parameters.

Knowledge Distillation Model Compression +1

NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference

no code implementations3 Dec 2021 Joonsang Yu, Junki Park, Seongmin Park, Minsoo Kim, Sihwa Lee, Dong Hyun Lee, Jungwook Choi

Non-linear operations such as GELU, Layer normalization, and Softmax are essential yet costly building blocks of Transformer models.

Cannot find the paper you are looking for? You can Submit a new open access paper.