Search Results for author: Zhihang Yuan

Found 21 papers, 13 papers with code

S2DNAS: Transforming Static CNN Model for Dynamic Inference via Neural Architecture Search

no code implementations ECCV 2020 Zhihang Yuan, Bingzhe Wu, Guangyu Sun, Zheng Liang, Shiwan Zhao, Weichen Bi

To this end, based on a given CNN model, we first generate a CNN architecture space in which each architecture is a multi-stage CNN generated from the given model using some predefined transformations.

Neural Architecture Search

LLM Inference Unveiled: Survey and Roofline Model Insights

2 code implementations26 Feb 2024 Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer

Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model for systematic analysis of LLM inference techniques.

Knowledge Distillation Language Modelling +3

WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains More

no code implementations19 Feb 2024 Yuxuan Yue, Zhihang Yuan, Haojie Duanmu, Sifan Zhou, Jianlong Wu, Liqiang Nie

Large Language Models (LLMs) face significant deployment challenges due to their substantial memory requirements and the computational demands of auto-regressive text generation process.

Quantization Text Generation

QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning

1 code implementation6 Feb 2024 Haoxuan Wang, Yuzhang Shang, Zhihang Yuan, Junyi Wu, Yan Yan

Diffusion models have achieved remarkable success in image generation tasks, yet their practical deployment is restrained by the high memory and time consumption.

Image Generation Model Compression +1

MIM4DD: Mutual Information Maximization for Dataset Distillation

1 code implementation NeurIPS 2023 Yuzhang Shang, Zhihang Yuan, Yan Yan

Thus, we introduce mutual information (MI) as the metric to quantify the shared information between the synthetic and the real datasets, and devise MIM4DD numerically maximizing the MI via a newly designed optimizable objective within a contrastive learning framework to update the synthetic dataset.

Contrastive Learning

Post-Training Quantization for Re-parameterization via Coarse & Fine Weight Splitting

1 code implementation17 Dec 2023 Dawei Yang, Ning He, Xing Hu, Zhihang Yuan, Jiangyong Yu, Chen Xu, Zhe Jiang

Although neural networks have made remarkable advancements in various applications, they require substantial computational and memory resources.

Quantization

ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models

1 code implementation10 Dec 2023 Zhihang Yuan, Yuzhang Shang, Yue Song, Qiang Wu, Yan Yan, Guangyu Sun

This paper explores a new post-hoc training-free compression paradigm for compressing Large Language Models (LLMs) to facilitate their wider adoption in various computing environments.

PB-LLM: Partially Binarized Large Language Models

2 code implementations29 Sep 2023 Yuzhang Shang, Zhihang Yuan, Qiang Wu, Zhen Dong

This paper explores network binarization, a radical form of quantization, compressing model weights to a single bit, specifically for Large Language Models (LLMs) compression.

Binarization Quantization

Latency-aware Unified Dynamic Networks for Efficient Image Recognition

1 code implementation30 Aug 2023 Yizeng Han, Zeyu Liu, Zhihang Yuan, Yifan Pu, Chaofei Wang, Shiji Song, Gao Huang

Dynamic computation has emerged as a promising avenue to enhance the inference efficiency of deep networks.

Scheduling

RPTQ: Reorder-based Post-training Quantization for Large Language Models

1 code implementation3 Apr 2023 Zhihang Yuan, Lin Niu, Jiawei Liu, Wenyu Liu, Xinggang Wang, Yuzhang Shang, Guangyu Sun, Qiang Wu, Jiaxiang Wu, Bingzhe Wu

In this paper, we identify that the challenge in quantizing activations in LLMs arises from varying ranges across channels, rather than solely the presence of outliers.

Quantization

Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance

no code implementations23 Mar 2023 Zhihang Yuan, Jiawei Liu, Jiaxiang Wu, Dawei Yang, Qiang Wu, Guangyu Sun, Wenyu Liu, Xinggang Wang, Bingzhe Wu

Post-training quantization (PTQ) is a popular method for compressing deep neural networks (DNNs) without modifying their original architecture or training procedures.

Benchmarking Data Augmentation +1

PD-Quant: Post-Training Quantization based on Prediction Difference Metric

1 code implementation CVPR 2023 Jiawei Liu, Lin Niu, Zhihang Yuan, Dawei Yang, Xinggang Wang, Wenyu Liu

It determines the quantization parameters by using the information of differences between network prediction before and after quantization.

Neural Network Compression Quantization

Post-training Quantization on Diffusion Models

1 code implementation CVPR 2023 Yuzhang Shang, Zhihang Yuan, Bin Xie, Bingzhe Wu, Yan Yan

These approaches define a forward diffusion process for transforming data into noise and a backward denoising process for sampling data from noise.

Denoising Noise Estimation +1

Latency-aware Spatial-wise Dynamic Networks

2 code implementations12 Oct 2022 Yizeng Han, Zhihang Yuan, Yifan Pu, Chenhao Xue, Shiji Song, Guangyu Sun, Gao Huang

The latency prediction model can efficiently estimate the inference latency of dynamic networks by simultaneously considering algorithms, scheduling strategies, and hardware properties.

Image Classification Instance Segmentation +4

PTQ4ViT: Post-Training Quantization Framework for Vision Transformers with Twin Uniform Quantization

1 code implementation24 Nov 2021 Zhihang Yuan, Chenhao Xue, Yiqi Chen, Qiang Wu, Guangyu Sun

We observe the distributions of activation values after softmax and GELU functions are quite different from the Gaussian distribution.

Quantization

ENAS4D: Efficient Multi-stage CNN Architecture Search for Dynamic Inference

no code implementations19 Sep 2020 Zhihang Yuan, Xin Liu, Bingzhe Wu, Guangyu Sun

The inference of a input sample can exit from early stage if the prediction of the stage is confident enough.

S2DNAS:Transforming Static CNN Model for Dynamic Inference via Neural Architecture Search

no code implementations16 Nov 2019 Zhihang Yuan, Bingzhe Wu, Zheng Liang, Shiwan Zhao, Weichen Bi, Guangyu Sun

Recently, dynamic inference has emerged as a promising way to reduce the computational cost of deep convolutional neural network (CNN).

Neural Architecture Search

Cannot find the paper you are looking for? You can Submit a new open access paper.