Search Results for author: Yeonju Ro

Found 4 papers, 0 papers with code

FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping

no code implementations5 Apr 2024 Ajay Jaiswal, Bodun Hu, Lu Yin, Yeonju Ro, Shiwei Liu, Tianlong Chen, Aditya Akella

In this work, we observed the saturation of computationally expensive feed-forward blocks of LLM layers and proposed FFN-SkipLLM, which is a novel fine-grained skip strategy of autoregressive LLMs.

Attribute Hallucination +1

Mr.BiQ: Post-Training Non-Uniform Quantization Based on Minimizing the Reconstruction Error

no code implementations CVPR 2022 Yongkweon Jeon, Chungman Lee, Eulrang Cho, Yeonju Ro

We thus propose a new post-training non-uniform quantization method, called Mr. BiQ, allowing low bit-width quantization even on Transformer models.

Binarization Quantization

Q-Rater: Non-Convex Optimization for Post-Training Uniform Quantization

no code implementations5 May 2021 Byeongwook Kim, Dongsoo Lee, Yeonju Ro, Yongkweon Jeon, Se Jung Kwon, Baeseong Park, Daehwan Oh

When the number of quantization bits is relatively low, however, non-convex optimization is unavoidable to improve model accuracy.

Quantization

Post-Training Weighted Quantization of Neural Networks for Language Models

no code implementations1 Jan 2021 Se Jung Kwon, Dongsoo Lee, Yongkweon Jeon, Byeongwook Kim, Bae Seong Park, Yeonju Ro

As a practical model compression technique, parameter quantization is effective especially for language models associated with a large memory footprint.

Model Compression Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.