Search Results for author: Fangmin Chen

Found 6 papers, 3 papers with code

ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models

1 code implementation16 Aug 2024 Chao Zeng, Songwei Liu, Yusheng Xie, Hong Liu, Xiaojian Wang, Miao Wei, Shu Yang, Fangmin Chen, Xing Mei

Based on W2*A8 quantization configuration on LLaMA-7B model, it achieved a WikiText2 perplexity of 7. 59 (2. 17$\downarrow $ vs 9. 76 in AffineQuant).

Model Compression Quantization

Hybrid SD: Edge-Cloud Collaborative Inference for Stable Diffusion Models

no code implementations13 Aug 2024 Chenqian Yan, Songwei Liu, Hongjian Liu, Xurui Peng, Xiaojian Wang, Fangmin Chen, Lean Fu, Xing Mei

On the flip side, while there are many compact models tailored for edge devices that can reduce these demands, they often compromise on semantic integrity and visual quality when compared to full-sized SDMs.

Collaborative Inference Diversity +1

FoldGPT: Simple and Effective Large Language Model Compression Scheme

no code implementations1 Jul 2024 Songwei Liu, Chao Zeng, Lianqiang Li, Chenqian Yan, Lean Fu, Xing Mei, Fangmin Chen

Based on this observation, we propose an efficient model volume compression strategy, termed FoldGPT, which combines block removal and block parameter sharing. This strategy consists of three parts: (1) Based on the learnable gating parameters, we determine the block importance ranking while modeling the coupling effect between blocks.

Language Modelling Large Language Model +1

SparseByteNN: A Novel Mobile Inference Acceleration Framework Based on Fine-Grained Group Sparsity

no code implementations30 Oct 2023 Haitao Xu, Songwei Liu, Yuyang Xu, Shuai Wang, Jiashi Li, Chenqian Yan, Liangqiang Li, Lean Fu, Xin Pan, Fangmin Chen

Our framework consists of two parts: (a) A fine-grained kernel sparsity schema with a sparsity granularity between structured pruning and unstructured pruning.

Network Pruning

Unfolding Once is Enough: A Deployment-Friendly Transformer Unit for Super-Resolution

1 code implementation5 Aug 2023 Yong liu, Hang Dong, Boyang Liang, Songwei Liu, Qingji Dong, Kai Chen, Fangmin Chen, Lean Fu, Fei Wang

Since the high resolution of intermediate features in SISR models increases memory and computational requirements, efficient SISR transformers are more favored.

Image Super-Resolution

Residual Local Feature Network for Efficient Super-Resolution

2 code implementations16 May 2022 Fangyuan Kong, Mingxi Li, Songwei Liu, Ding Liu, Jingwen He, Yang Bai, Fangmin Chen, Lean Fu

Moreover, we revisit the popular contrastive loss and observe that the selection of intermediate features of its feature extractor has great influence on the performance.

Image Super-Resolution SSIM

Cannot find the paper you are looking for? You can Submit a new open access paper.