Search Results for author: Ruihao Gong

Found 37 papers, 22 papers with code

PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards Algorithms and Models

1 code implementation10 Dec 2024 Zining Wnag, Jinyang Guo, Ruihao Gong, Yang Yong, Aishan Liu, Yushi Huang, Jiaheng Liu, Xianglong Liu

Our PTSBench can provide (1) new observations for a better understanding of the PTS algorithms, (2) in-depth and comprehensive evaluations for the sparsification ability of models, and (3) a well-structured and easy-integrate open-source framework.

A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms

no code implementations25 Sep 2024 Ruihao Gong, Yifu Ding, Zining Wang, Chengtao Lv, Xingyu Zheng, Jinyang Du, Haotong Qin, Jinyang Guo, Michele Magno, Xianglong Liu

Large language models (LLMs) have achieved remarkable advancements in natural language processing, showcasing exceptional performance across various tasks.

Quantization

OmniBal: Towards Fast Instruct-tuning for Vision-Language Models via Omniverse Computation Balance

1 code implementation30 Jul 2024 Yongqiang Yao, Jingru Tan, Jiahao Hu, Feizhao Zhang, Xin Jin, Bo Li, Ruihao Gong, PengFei Liu

We rebalanced the computational loads from data, model, and memory perspectives to address this issue, achieving more balanced computation across devices.

Temporal Feature Matters: A Framework for Diffusion Model Quantization

no code implementations28 Jul 2024 Yushi Huang, Ruihao Gong, Xianglong Liu, Jing Liu, Yuhang Li, Jiwen Lu, DaCheng Tao

However, unlike traditional models, diffusion models critically rely on the time-step for the multi-round denoising.

Denoising Image Generation +1

ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models

no code implementations13 Jun 2024 Jing Liu, Ruihao Gong, Mingyang Zhang, Yefei He, Jianfei Cai, Bohan Zhuang

LLM development involves pre-training a foundation model on massive data, followed by fine-tuning on task-specific data to create specialized experts.

Code Generation domain classification +3

Fast and Controllable Post-training Sparsity: Learning Optimal Sparsity Allocation with Global Constraint in Minutes

1 code implementation9 May 2024 Ruihao Gong, Yang Yong, Zining Wang, Jinyang Guo, Xiuying Wei, Yuqing Ma, Xianglong Liu

Previous methods for finding sparsity rates mainly focus on the training-aware scenario, which usually fails to converge stably under the PTS setting with limited data and much less training cost.

ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding

no code implementations21 Feb 2024 Shuzhang Zhong, Zebin Yang, Meng Li, Ruihao Gong, Runsheng Wang, Ru Huang

Additionally, it introduces a dynamic token tree generation algorithm to balance the computation and parallelism of the verification phase in real-time and maximize the overall efficiency across different batch sizes, sequence lengths, and tasks, etc.

TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models

1 code implementation CVPR 2024 Yushi Huang, Ruihao Gong, Jing Liu, Tianlong Chen, Xianglong Liu

Remarkably, our quantization approach, for the first time, achieves model performance nearly on par with the full-precision model under 4-bit weight quantization.

Denoising Image Generation +1

QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models

2 code implementations12 Oct 2023 Jing Liu, Ruihao Gong, Xiuying Wei, Zhiwei Dong, Jianfei Cai, Bohan Zhuang

Additionally, an adaptive strategy is designed to autonomously determine the optimal number of sub-channels for channel disassembly.

Quantization

Lossy and Lossless (L$^2$) Post-training Model Size Compression

1 code implementation8 Aug 2023 Yumeng Shi, Shihao Bai, Xiuying Wei, Ruihao Gong, Jianlei Yang

Then, a dedicated differentiable counter is introduced to guide the optimization of lossy compression to arrive at a more suitable point for later lossless compression.

SysNoise: Exploring and Benchmarking Training-Deployment System Inconsistency

no code implementations1 Jul 2023 Yan Wang, Yuhang Li, Ruihao Gong, Aishan Liu, Yanfei Wang, Jian Hu, Yongqiang Yao, Yunchen Zhang, Tianzi Xiao, Fengwei Yu, Xianglong Liu

Extensive studies have shown that deep learning models are vulnerable to adversarial and natural noises, yet little is known about model robustness on noises caused by different system implementations.

Benchmarking Data Augmentation +6

Lossy and Lossless (L2) Post-training Model Size Compression

1 code implementation ICCV 2023 Yumeng Shi, Shihao Bai, Xiuying Wei, Ruihao Gong, Jianlei Yang

Then, a dedicated differentiable counter is introduced to guide the optimization of lossy compression to arrive at a more suitable point for later lossless compression.

Exploring the Relationship Between Architectural Design and Adversarially Robust Generalization

no code implementations CVPR 2023 Aishan Liu, Shiyu Tang, Siyuan Liang, Ruihao Gong, Boxi Wu, Xianglong Liu, DaCheng Tao

In particular, we comprehensively evaluated 20 most representative adversarially trained architectures on ImageNette and CIFAR-10 datasets towards multiple l_p-norm adversarial attacks.

Exploring the Relationship between Architecture and Adversarially Robust Generalization

no code implementations28 Sep 2022 Aishan Liu, Shiyu Tang, Siyuan Liang, Ruihao Gong, Boxi Wu, Xianglong Liu, DaCheng Tao

Inparticular, we comprehensively evaluated 20 most representative adversarially trained architectures on ImageNette and CIFAR-10 datasets towards multiple `p-norm adversarial attacks.

Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models

1 code implementation27 Sep 2022 Xiuying Wei, Yunchen Zhang, Xiangguo Zhang, Ruihao Gong, Shanghang Zhang, Qi Zhang, Fengwei Yu, Xianglong Liu

With the trends of large NLP models, the increasing memory and computation costs hinder their efficient deployment on resource-limited devices.

Quantization

QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization

2 code implementations11 Mar 2022 Xiuying Wei, Ruihao Gong, Yuhang Li, Xianglong Liu, Fengwei Yu

With QDROP, the limit of PTQ is pushed to the 2-bit activation for the first time and the accuracy boost can be up to 51. 49%.

Image Classification object-detection +5

MQBench: Towards Reproducible and Deployable Model Quantization Benchmark

1 code implementation5 Nov 2021 Yuhang Li, Mingzhu Shen, Jian Ma, Yan Ren, Mingxin Zhao, Qi Zhang, Ruihao Gong, Fengwei Yu, Junjie Yan

Surprisingly, no existing algorithm wins every challenge in MQBench, and we hope this work could inspire future research directions.

model Quantization

Distribution-sensitive Information Retention for Accurate Binary Neural Network

no code implementations25 Sep 2021 Haotong Qin, Xiangguo Zhang, Ruihao Gong, Yifu Ding, Yi Xu, Xianglong Liu

We present a novel Distribution-sensitive Information Retention Network (DIR-Net) that retains the information in the forward and backward propagation by improving internal propagation and introducing external representations.

Binarization Image Classification +1

RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

1 code implementation11 Sep 2021 Shiyu Tang, Ruihao Gong, Yan Wang, Aishan Liu, Jiakai Wang, Xinyun Chen, Fengwei Yu, Xianglong Liu, Dawn Song, Alan Yuille, Philip H. S. Torr, DaCheng Tao

Thus, we propose RobustART, the first comprehensive Robustness investigation benchmark on ImageNet regarding ARchitecture design (49 human-designed off-the-shelf architectures and 1200+ networks from neural architecture search) and Training techniques (10+ techniques, e. g., data augmentation) towards diverse noises (adversarial, natural, and system noises).

Adversarial Robustness Benchmarking +2

Real World Robustness from Systematic Noise

no code implementations2 Sep 2021 Yan Wang, Yuhang Li, Ruihao Gong

Systematic error, which is not determined by chance, often refers to the inaccuracy (involving either the observation or measurement process) inherent to a system.

A Free Lunch From ANN: Towards Efficient, Accurate Spiking Neural Networks Calibration

1 code implementation13 Jun 2021 Yuhang Li, Shikuang Deng, Xin Dong, Ruihao Gong, Shi Gu

Moreover, our calibration algorithm can produce SNN with state-of-the-art architecture on the large-scale ImageNet dataset, including MobileNet and RegNet.

Diversifying Sample Generation for Accurate Data-Free Quantization

no code implementations CVPR 2021 Xiangguo Zhang, Haotong Qin, Yifu Ding, Ruihao Gong, Qinghua Yan, Renshuai Tao, Yuhang Li, Fengwei Yu, Xianglong Liu

Unfortunately, we find that in practice, the synthetic data identically constrained by BN statistics suffers serious homogenization at both distribution level and sample level and further causes a significant performance drop of the quantized model.

Data Free Quantization Image Classification

BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction

3 code implementations ICLR 2021 Yuhang Li, Ruihao Gong, Xu Tan, Yang Yang, Peng Hu, Qi Zhang, Fengwei Yu, Wei Wang, Shi Gu

To further employ the power of quantization, the mixed precision technique is incorporated in our framework by approximating the inter-layer and intra-layer sensitivity.

Image Classification object-detection +2

MixMix: All You Need for Data-Free Compression Are Feature and Data Mixing

no code implementations ICCV 2021 Yuhang Li, Feng Zhu, Ruihao Gong, Mingzhu Shen, Xin Dong, Fengwei Yu, Shaoqing Lu, Shi Gu

However, the inversion process only utilizes biased feature statistics stored in one model and is from low-dimension to high-dimension.

All Knowledge Distillation +2

Binary Neural Networks: A Survey

2 code implementations31 Mar 2020 Haotong Qin, Ruihao Gong, Xianglong Liu, Xiao Bai, Jingkuan Song, Nicu Sebe

The binary neural network, largely saving the storage and computation, serves as a promising technique for deploying deep models on resource-limited devices.

Binarization Image Classification +5

Efficient Bitwidth Search for Practical Mixed Precision Neural Network

no code implementations17 Mar 2020 Yuhang Li, Wei Wang, Haoli Bai, Ruihao Gong, Xin Dong, Fengwei Yu

Network quantization has rapidly become one of the most widely used methods to compress and accelerate deep neural networks.

Quantization

Towards Unified INT8 Training for Convolutional Neural Network

no code implementations CVPR 2020 Feng Zhu, Ruihao Gong, Fengwei Yu, Xianglong Liu, Yanfei Wang, Zhelong Li, Xiuqi Yang, Junjie Yan

In this paper, we give an attempt to build a unified 8-bit (INT8) training framework for common convolutional neural networks from the aspects of both accuracy and speed.

object-detection Object Detection +1

Balanced Binary Neural Networks with Gated Residual

1 code implementation26 Sep 2019 Mingzhu Shen, Xianglong Liu, Ruihao Gong, Kai Han

In this paper, we attempt to maintain the information propagated in the forward process and propose a Balanced Binary Neural Networks with Gated Residual (BBG for short).

Ranked #1049 on Image Classification on ImageNet (Top 1 Accuracy metric)

Binarization General Classification +1

Forward and Backward Information Retention for Accurate Binary Neural Networks

2 code implementations CVPR 2020 Haotong Qin, Ruihao Gong, Xianglong Liu, Mingzhu Shen, Ziran Wei, Fengwei Yu, Jingkuan Song

Our empirical study indicates that the quantization brings information loss in both forward and backward propagation, which is the bottleneck of training accurate binary neural networks.

Binarization Neural Network Compression +1

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

2 code implementations ICCV 2019 Ruihao Gong, Xianglong Liu, Shenghu Jiang, Tianxiang Li, Peng Hu, Jiazhen Lin, Fengwei Yu, Junjie Yan

Hardware-friendly network quantization (e. g., binary/uniform quantization) can efficiently accelerate the inference and meanwhile reduce memory consumption of the deep neural networks, which is crucial for model deployment on resource-limited devices like mobile phones.

Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.