no code implementations • 13 Dec 2024 • Junrui Xiao, Zhikai Li, Lianwei Yang, Yiduo Mei, Qingyi Gu
Post-training quantization (PTQ) reduces excessive hardware cost by quantizing full-precision models into lower bit representations on a tiny calibration set, without retraining.
1 code implementation • 4 Dec 2024 • Wangbo Zhao, Yizeng Han, Jiasheng Tang, Zhikai Li, Yibing Song, Kai Wang, Zhangyang Wang, Yang You
Vision-language models (VLMs) have shown remarkable success across various multi-modal tasks, yet large VLMs encounter significant efficiency challenges due to processing numerous visual tokens.
Ranked #19 on
Visual Question Answering
on MM-Vet
no code implementations • 22 Sep 2024 • Xuewen Liu, Zhikai Li, Qingyi Gu
The range of activations decreases, which makes activations quantization easy.
no code implementations • 14 Sep 2024 • Zhikai Li, Jing Zhang, Qingyi Gu
In this paper, we propose a data-free quantization framework for SAM, called DFQ-SAM, which learns and calibrates quantization parameters without any original data, thus effectively preserving data privacy during model compression.
no code implementations • 26 Aug 2024 • Zhikai Li, Xuewen Liu, Dongrong Fu, Jianquan Li, Qingyi Gu, Kurt Keutzer, Zhen Dong
Arena platform, which gathers user votes on model comparisons, can rank models with human preferences.
no code implementations • 13 Jun 2024 • Lianwei Yang, Zhikai Li, Junrui Xiao, Haisong Gong, Qingyi Gu
Extra-Block Global Supervision considers the relationship between block outputs and the model's output, aiding block-wise reconstruction through global supervision.
no code implementations • 28 Mar 2024 • Zhikai Li, Steve Vott, Bhaskar Krishnamachar
Finally, the model inference can also be done with a function call providing the input.
2 code implementations • 26 Feb 2024 • Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer
Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model for systematic analysis of LLM inference techniques.
no code implementations • 8 Feb 2024 • Zhikai Li, Xuewen Liu, Jing Zhang, Qingyi Gu
In particular, for the former, we introduce a learnable per-channel dual clipping scheme, which is designed to efficiently identify outliers in the unbalanced activations with fine granularity.
no code implementations • 22 Jan 2024 • Sihan Niu, Yifan Zhou, Zhikai Li, Shuyao Huang, Yujun Zhou
This paper presents a unique solution to challenges in medical image processing by incorporating an adaptive curve grey wolf optimization (ACGWO) algorithm into neural network backpropagation.
1 code implementation • 22 Jan 2024 • Zhikai Li, Murong Yi, Ali Uneri, Sihan Niu, Craig Jones
Polyp segmentation is a key aspect of colorectal cancer prevention, enabling early detection and guiding subsequent treatments.
1 code implementation • 9 Jan 2024 • Xuewen Liu, Zhikai Li, Junrui Xiao, Qingyi Gu
Unfortunately, we find that due to the highly dynamic distribution of activations in different denoising steps, existing PTQ methods for diffusion models suffer from distribution mismatch issues at both calibration sample level and reconstruction output level, which makes the performance far from satisfactory, especially in low-bit cases.
no code implementations • 11 Oct 2023 • Zhikai Li, Xiaoxuan Liu, Banghua Zhu, Zhen Dong, Qingyi Gu, Kurt Keutzer
Large Language Models (LLMs) have showcased remarkable impacts across a wide spectrum of natural language processing tasks.
no code implementations • 24 May 2023 • Junrui Xiao, Zhikai Li, Lianwei Yang, Qingyi Gu
In this paper, we first argue empirically that the severe performance degradation is mainly caused by the weight oscillation in the binarization training and the information distortion in the activation of ViTs.
no code implementations • 11 May 2023 • Junrui Xiao, Zhikai Li, Lianwei Yang, Qingyi Gu
As emerging hardware begins to support mixed bit-width arithmetic computation, mixed-precision quantization is widely used to reduce the complexity of neural networks.
1 code implementation • ICCV 2023 • Zhikai Li, Junrui Xiao, Lianwei Yang, Qingyi Gu
Post-training quantization (PTQ), which only requires a tiny dataset for calibration without end-to-end retraining, is a light and practical model compression technique.
1 code implementation • 13 Sep 2022 • Zhikai Li, Mengjuan Chen, Junrui Xiao, Qingyi Gu
In this paper, we propose PSAQ-ViT V2, a more accurate and general data-free quantization framework for ViTs, built on top of PSAQ-ViT.
1 code implementation • ICCV 2023 • Zhikai Li, Qingyi Gu
In this paper, we propose I-ViT, an integer-only quantization scheme for ViTs, to enable ViTs to perform the entire computational graph of inference with integer arithmetic and bit-shifting, and without any floating-point arithmetic.
1 code implementation • 4 Mar 2022 • Zhikai Li, Liping Ma, Mengjuan Chen, Junrui Xiao, Qingyi Gu
The above insights guide us to design a relative value metric to optimize the Gaussian noise to approximate the real images, which are then utilized to calibrate the quantization parameters.