Search Results for author: Yongkweon Jeon

Found 10 papers, 1 papers with code

Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

no code implementations14 Feb 2024 Junhan Kim, Kyungphil Park, Chungman Lee, Ho-young Kim, Joonyoung Kim, Yongkweon Jeon

Through extensive experiments on various language models and complexity analysis, we demonstrate that aespa is accurate and efficient in quantizing Transformer models.

Quantization

Genie: Show Me the Data for Quantization

1 code implementation CVPR 2023 Yongkweon Jeon, Chungman Lee, Ho-young Kim

We also propose a post-training quantization algorithm to enhance the performance of quantized models.

Data Free Quantization

Mr.BiQ: Post-Training Non-Uniform Quantization Based on Minimizing the Reconstruction Error

no code implementations CVPR 2022 Yongkweon Jeon, Chungman Lee, Eulrang Cho, Yeonju Ro

We thus propose a new post-training non-uniform quantization method, called Mr. BiQ, allowing low bit-width quantization even on Transformer models.

Binarization Quantization

Modulating Regularization Frequency for Efficient Compression-Aware Model Training

no code implementations5 May 2021 Dongsoo Lee, Se Jung Kwon, Byeongwook Kim, Jeongin Yun, Baeseong Park, Yongkweon Jeon

While model compression is increasingly important because of large neural network size, compression-aware training is challenging as it needs sophisticated model modifications and longer training time. In this paper, we introduce regularization frequency (i. e., how often compression is performed during training) as a new regularization technique for a practical and efficient compression-aware training method.

Model Compression

Q-Rater: Non-Convex Optimization for Post-Training Uniform Quantization

no code implementations5 May 2021 Byeongwook Kim, Dongsoo Lee, Yeonju Ro, Yongkweon Jeon, Se Jung Kwon, Baeseong Park, Daehwan Oh

When the number of quantization bits is relatively low, however, non-convex optimization is unavoidable to improve model accuracy.

Quantization

Post-Training Weighted Quantization of Neural Networks for Language Models

no code implementations1 Jan 2021 Se Jung Kwon, Dongsoo Lee, Yongkweon Jeon, Byeongwook Kim, Bae Seong Park, Yeonju Ro

As a practical model compression technique, parameter quantization is effective especially for language models associated with a large memory footprint.

Model Compression Quantization

FleXOR: Trainable Fractional Quantization

no code implementations NeurIPS 2020 Dongsoo Lee, Se Jung Kwon, Byeongwook Kim, Yongkweon Jeon, Baeseong Park, Jeongin Yun

Quantization based on the binary codes is gaining attention because each quantized bit can be directly utilized for computations without dequantization using look-up tables.

Quantization

BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based Quantized DNNs

no code implementations20 May 2020 Yongkweon Jeon, Baeseong Park, Se Jung Kwon, Byeongwook Kim, Jeongin Yun, Dongsoo Lee

Success of quantization in practice, hence, relies on an efficient computation engine design, especially for matrix multiplication that is a basic computation engine in most DNNs.

Quantization

Decoupling Weight Regularization from Batch Size for Model Compression

no code implementations25 Sep 2019 Dongsoo Lee, Se Jung Kwon, Byeongwook Kim, Yongkweon Jeon, Baeseong Park, Jeongin Yun, Gu-Yeon Wei

Using various models, we show that simple weight updates to comply with compression formats along with long NR period is enough to achieve high compression ratio and model accuracy.

Model Compression

Cannot find the paper you are looking for? You can Submit a new open access paper.