Search Results for author: Se Jung Kwon

Found 16 papers, 1 papers with code

AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models

no code implementations8 Oct 2022 Se Jung Kwon, Jeonghoon Kim, Jeongin Bae, Kang Min Yoo, Jin-Hwa Kim, Baeseong Park, Byeongwook Kim, Jung-Woo Ha, Nako Sung, Dongsoo Lee

To combine parameter-efficient adaptation and model compression, we propose AlphaTuning consisting of post-training quantization of the pre-trained language model and fine-tuning only some parts of quantized parameters for a target task.

Language Modelling Model Compression +1

nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models

no code implementations20 Jun 2022 Gunho Park, Baeseong Park, Sungjae Lee, Minsub Kim, Byeongwook Kim, Se Jung Kwon, Youngjoo Lee, Dongsoo Lee

Our proposed nuQmm reduces the latency of not only each GPU but also the entire inference of large LMs because a high compression ratio (by low-bit quantization) mitigates the minimum required number of GPUs.

Quantization Self-Supervised Learning

Maximum Likelihood Training of Implicit Nonlinear Diffusion Models

1 code implementation27 May 2022 Dongjun Kim, Byeonghu Na, Se Jung Kwon, Dongsoo Lee, Wanmo Kang, Il-Chul Moon

Whereas diverse variations of diffusion models exist, extending the linear diffusion into a nonlinear diffusion process is investigated by very few works.

Image Generation

Maximum Likelihood Training of Parametrized Diffusion Model

no code implementations29 Sep 2021 Dongjun Kim, Byeonghu Na, Se Jung Kwon, Dongsoo Lee, Wanmo Kang, Il-Chul Moon

Specifically, PDM utilizes the flow to non-linearly transform a data variable into a latent variable, and PDM applies the diffusion process to the transformed latent distribution with the linear diffusing mechanism.

Image Generation

Encoding Weights of Irregular Sparsity for Fixed-to-Fixed Model Compression

no code implementations ICLR 2022 Baeseong Park, Se Jung Kwon, Daehwan Oh, Byeongwook Kim, Dongsoo Lee

Then, as an effort to push the compression ratio to the theoretical maximum (by entropy), we propose a sequential fixed-to-fixed encoding scheme.

Model Compression

Q-Rater: Non-Convex Optimization for Post-Training Uniform Quantization

no code implementations5 May 2021 Byeongwook Kim, Dongsoo Lee, Yeonju Ro, Yongkweon Jeon, Se Jung Kwon, Baeseong Park, Daehwan Oh

When the number of quantization bits is relatively low, however, non-convex optimization is unavoidable to improve model accuracy.

Quantization

Modulating Regularization Frequency for Efficient Compression-Aware Model Training

no code implementations5 May 2021 Dongsoo Lee, Se Jung Kwon, Byeongwook Kim, Jeongin Yun, Baeseong Park, Yongkweon Jeon

While model compression is increasingly important because of large neural network size, compression-aware training is challenging as it needs sophisticated model modifications and longer training time. In this paper, we introduce regularization frequency (i. e., how often compression is performed during training) as a new regularization technique for a practical and efficient compression-aware training method.

Model Compression

Post-Training Weighted Quantization of Neural Networks for Language Models

no code implementations1 Jan 2021 Se Jung Kwon, Dongsoo Lee, Yongkweon Jeon, Byeongwook Kim, Bae Seong Park, Yeonju Ro

As a practical model compression technique, parameter quantization is effective especially for language models associated with a large memory footprint.

Model Compression Quantization

FleXOR: Trainable Fractional Quantization

no code implementations NeurIPS 2020 Dongsoo Lee, Se Jung Kwon, Byeongwook Kim, Yongkweon Jeon, Baeseong Park, Jeongin Yun

Quantization based on the binary codes is gaining attention because each quantized bit can be directly utilized for computations without dequantization using look-up tables.

Quantization

BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based Quantized DNNs

no code implementations20 May 2020 Yongkweon Jeon, Baeseong Park, Se Jung Kwon, Byeongwook Kim, Jeongin Yun, Dongsoo Lee

Success of quantization in practice, hence, relies on an efficient computation engine design, especially for matrix multiplication that is a basic computation engine in most DNNs.

Quantization

Network Pruning for Low-Rank Binary Index

no code implementations25 Sep 2019 Dongsoo Lee, Se Jung Kwon, Byeongwook Kim, Parichay Kapoor, Gu-Yeon Wei

In this paper, we propose a new network pruning technique that generates a low-rank binary index matrix to compress index data significantly.

Model Compression Network Pruning +1

Decoupling Weight Regularization from Batch Size for Model Compression

no code implementations25 Sep 2019 Dongsoo Lee, Se Jung Kwon, Byeongwook Kim, Yongkweon Jeon, Baeseong Park, Jeongin Yun, Gu-Yeon Wei

Using various models, we show that simple weight updates to comply with compression formats along with long NR period is enough to achieve high compression ratio and model accuracy.

Model Compression

Learning Low-Rank Approximation for CNNs

no code implementations24 May 2019 Dongsoo Lee, Se Jung Kwon, Byeongwook Kim, Gu-Yeon Wei

Low-rank approximation is an effective model compression technique to not only reduce parameter storage requirements, but to also reduce computations.

Model Compression

Structured Compression by Weight Encryption for Unstructured Pruning and Quantization

no code implementations CVPR 2020 Se Jung Kwon, Dongsoo Lee, Byeongwook Kim, Parichay Kapoor, Baeseong Park, Gu-Yeon Wei

Model compression techniques, such as pruning and quantization, are becoming increasingly important to reduce the memory footprints and the amount of computations.

Model Compression Quantization

Network Pruning for Low-Rank Binary Indexing

no code implementations14 May 2019 Dongsoo Lee, Se Jung Kwon, Byeongwook Kim, Parichay Kapoor, Gu-Yeon Wei

Pruning is an efficient model compression technique to remove redundancy in the connectivity of deep neural networks (DNNs).

Model Compression Network Pruning

Cannot find the paper you are looking for? You can Submit a new open access paper.