Search Results for author: Yongin Kwon

Found 5 papers, 2 papers with code

LLMem: Estimating GPU Memory Usage for Fine-Tuning Pre-Trained LLMs

1 code implementation • 16 Apr 2024 • TaeHo Kim, Yanming Wang, Vatshank Chaturvedi, Lokesh Gupta, Seyeon Kim, Yongin Kwon, Sangtae Ha

Fine-tuning pre-trained large language models (LLMs) with limited hardware presents challenges due to GPU memory constraints.

Paper
Code

Tensor Slicing and Optimization for Multicore NPUs

no code implementations • 6 Apr 2023 • Rafael Sousa, Marcio Pereira, Yongin Kwon, TaeHo Kim, Namsoon Jung, Chang Soo Kim, Michael Frank, Guido Araujo

Although code generation for Convolution Neural Network (CNN) models has been extensively studied, performing efficient data slicing and parallelization for highly-constrai\-ned Multicore Neural Processor Units (NPUs) is still a challenging problem.

Code Generation Compiler Optimization

Paper
Add Code

Q-HyViT: Post-Training Quantization for Hybrid Vision Transformer with Bridge Block Reconstruction

no code implementations • 22 Mar 2023 • Jemin Lee, Yongin Kwon, Jeman Park, Misun Yu, Sihyeong Park, Hwanjun Song

To overcome these challenges, we propose a new post-training quantization method, which is the first to quantize efficient hybrid ViTs (MobileViTv1, MobileViTv2, Mobile-Former, EfficientFormerV1, EfficientFormerV2) with a significant margin (an average improvement of 8. 32\% for 8-bit and 26. 02\% for 6-bit) compared to existing PTQ methods (EasyQuant, FQ-ViT, and PTQ4ViT).

Quantization

Paper
Add Code

CPrune: Compiler-Informed Model Pruning for Efficient Target-Aware DNN Execution

1 code implementation • 4 Jul 2022 • Yongin Kwon, Jemin Lee, TaeHo Kim, Sangtae Ha

We propose CPrune, a compiler-informed model pruning for efficient target-aware DNN execution to support an application with a required target accuracy.

Compiler Optimization Image Classification +3

Paper
Code

Quantune: Post-training Quantization of Convolutional Neural Networks using Extreme Gradient Boosting for Fast Deployment

no code implementations • 10 Feb 2022 • Jemin Lee, Misun Yu, Yongin Kwon, TaeHo Kim

To adopt convolutional neural networks (CNN) for a range of resource-constrained targets, it is necessary to compress the CNN models by performing quantization, whereby precision representation is converted to a lower bit representation.

Quantization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.