Search Results for author: Baeseong Park

Found 11 papers, 1 papers with code

DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation

1 code implementation27 Feb 2024 Sunghyeon Woo, Baeseong Park, Byeongwook Kim, Minjung Jo, Sejung Kwon, Dongsuk Jeon, Dongsoo Lee

In this paper, we propose Dropping Backward Propagation (DropBP), a novel approach designed to reduce computational costs while maintaining accuracy.

AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models

no code implementations8 Oct 2022 Se Jung Kwon, Jeonghoon Kim, Jeongin Bae, Kang Min Yoo, Jin-Hwa Kim, Baeseong Park, Byeongwook Kim, Jung-Woo Ha, Nako Sung, Dongsoo Lee

To combine parameter-efficient adaptation and model compression, we propose AlphaTuning consisting of post-training quantization of the pre-trained language model and fine-tuning only some parts of quantized parameters for a target task.

Language Modelling Model Compression +1

LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models

no code implementations20 Jun 2022 Gunho Park, Baeseong Park, Minsub Kim, Sungjae Lee, Jeonghoon Kim, Beomseok Kwon, Se Jung Kwon, Byeongwook Kim, Youngjoo Lee, Dongsoo Lee

By reducing the latency of individual GPUs and the overall inference process for large-scale language models, LUT-GEMM provides significant performance improvements in inference.

Quantization Self-Supervised Learning

Modulating Regularization Frequency for Efficient Compression-Aware Model Training

no code implementations5 May 2021 Dongsoo Lee, Se Jung Kwon, Byeongwook Kim, Jeongin Yun, Baeseong Park, Yongkweon Jeon

While model compression is increasingly important because of large neural network size, compression-aware training is challenging as it needs sophisticated model modifications and longer training time. In this paper, we introduce regularization frequency (i. e., how often compression is performed during training) as a new regularization technique for a practical and efficient compression-aware training method.

Model Compression

Encoding Weights of Irregular Sparsity for Fixed-to-Fixed Model Compression

no code implementations ICLR 2022 Baeseong Park, Se Jung Kwon, Daehwan Oh, Byeongwook Kim, Dongsoo Lee

Then, as an effort to push the compression ratio to the theoretical maximum (by entropy), we propose a sequential fixed-to-fixed encoding scheme.

Model Compression

Q-Rater: Non-Convex Optimization for Post-Training Uniform Quantization

no code implementations5 May 2021 Byeongwook Kim, Dongsoo Lee, Yeonju Ro, Yongkweon Jeon, Se Jung Kwon, Baeseong Park, Daehwan Oh

When the number of quantization bits is relatively low, however, non-convex optimization is unavoidable to improve model accuracy.

Quantization

FleXOR: Trainable Fractional Quantization

no code implementations NeurIPS 2020 Dongsoo Lee, Se Jung Kwon, Byeongwook Kim, Yongkweon Jeon, Baeseong Park, Jeongin Yun

Quantization based on the binary codes is gaining attention because each quantized bit can be directly utilized for computations without dequantization using look-up tables.

Quantization

BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based Quantized DNNs

no code implementations20 May 2020 Yongkweon Jeon, Baeseong Park, Se Jung Kwon, Byeongwook Kim, Jeongin Yun, Dongsoo Lee

Success of quantization in practice, hence, relies on an efficient computation engine design, especially for matrix multiplication that is a basic computation engine in most DNNs.

Quantization

Decoupling Weight Regularization from Batch Size for Model Compression

no code implementations25 Sep 2019 Dongsoo Lee, Se Jung Kwon, Byeongwook Kim, Yongkweon Jeon, Baeseong Park, Jeongin Yun, Gu-Yeon Wei

Using various models, we show that simple weight updates to comply with compression formats along with long NR period is enough to achieve high compression ratio and model accuracy.

Model Compression

Structured Compression by Weight Encryption for Unstructured Pruning and Quantization

no code implementations CVPR 2020 Se Jung Kwon, Dongsoo Lee, Byeongwook Kim, Parichay Kapoor, Baeseong Park, Gu-Yeon Wei

Model compression techniques, such as pruning and quantization, are becoming increasingly important to reduce the memory footprints and the amount of computations.

Model Compression Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.