Search Results for author: Mengzhao Chen

Found 11 papers, 10 papers with code

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation

1 code implementation18 Feb 2024 Peng Xu, Wenqi Shao, Mengzhao Chen, Shitao Tang, Kaipeng Zhang, Peng Gao, Fengwei An, Yu Qiao, Ping Luo

Large language models (LLMs) have demonstrated outstanding performance in various tasks, such as text summarization, text question-answering, and etc.

Question Answering Text Summarization

I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization

1 code implementation16 Nov 2023 Yunshan Zhong, Jiawei Hu, Mingbao Lin, Mengzhao Chen, Rongrong Ji

Albeit the scalable performance of vision transformers (ViTs), the dense computational costs (training & inference) undermine their position in industrial applications.

Quantization

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

2 code implementations25 Aug 2023 Wenqi Shao, Mengzhao Chen, Zhaoyang Zhang, Peng Xu, Lirui Zhao, Zhiqian Li, Kaipeng Zhang, Peng Gao, Yu Qiao, Ping Luo

To tackle this issue, we introduce an Omnidirectionally calibrated Quantization (OmniQuant) technique for LLMs, which achieves good performance in diverse quantization settings while maintaining the computational efficiency of PTQ by efficiently optimizing various quantization parameters.

Common Sense Reasoning Computational Efficiency +3

Spatial Re-parameterization for N:M Sparsity

no code implementations9 Jun 2023 Yuxin Zhang, Mingbao Lin, Yunshan Zhong, Mengzhao Chen, Fei Chao, Rongrong Ji

This paper presents a Spatial Re-parameterization (SpRe) method for the N:M sparsity in CNNs.

DiffRate : Differentiable Compression Rate for Efficient Vision Transformers

1 code implementation ICCV 2023 Mengzhao Chen, Wenqi Shao, Peng Xu, Mingbao Lin, Kaipeng Zhang, Fei Chao, Rongrong Ji, Yu Qiao, Ping Luo

Token compression aims to speed up large-scale vision transformers (e. g. ViTs) by pruning (dropping) or merging tokens.

MultiQuant: A Novel Multi-Branch Topology Method for Arbitrary Bit-width Network Quantization

1 code implementation14 May 2023 Yunshan Zhong, Mingbao Lin, Yuyao Zhou, Mengzhao Chen, Yuxin Zhang, Fei Chao, Rongrong Ji

However, in this paper, we investigate existing methods and observe a significant accumulation of quantization errors caused by frequent bit-width switching of weights and activations, leading to limited performance.

Quantization

SMMix: Self-Motivated Image Mixing for Vision Transformers

1 code implementation ICCV 2023 Mengzhao Chen, Mingbao Lin, Zhihang Lin, Yuxin Zhang, Fei Chao, Rongrong Ji

Due to the subtle designs of the self-motivated paradigm, our SMMix is significant in its smaller training overhead and better performance than other CutMix variants.

Super Vision Transformer

1 code implementation23 May 2022 Mingbao Lin, Mengzhao Chen, Yuxin Zhang, Chunhua Shen, Rongrong Ji, Liujuan Cao

Experimental results on ImageNet demonstrate that our SuperViT can considerably reduce the computational costs of ViT models with even performance increase.

CF-ViT: A General Coarse-to-Fine Method for Vision Transformer

1 code implementation8 Mar 2022 Mengzhao Chen, Mingbao Lin, Ke Li, Yunhang Shen, Yongjian Wu, Fei Chao, Rongrong Ji

Our proposed CF-ViT is motivated by two important observations in modern ViT models: (1) The coarse-grained patch splitting can locate informative regions of an input image.

OptG: Optimizing Gradient-driven Criteria in Network Sparsity

1 code implementation30 Jan 2022 Yuxin Zhang, Mingbao Lin, Mengzhao Chen, Fei Chao, Rongrong Ji

We prove that supermask training is to accumulate the criteria of gradient-driven sparsity for both removed and preserved weights, and it can partly solve the independence paradox.

Fine-grained Data Distribution Alignment for Post-Training Quantization

1 code implementation9 Sep 2021 Yunshan Zhong, Mingbao Lin, Mengzhao Chen, Ke Li, Yunhang Shen, Fei Chao, Yongjian Wu, Rongrong Ji

While post-training quantization receives popularity mostly due to its evasion in accessing the original complete training dataset, its poor performance also stems from scarce images.

Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.