Search Results for author: Yaoxiu Lian

Found 3 papers, 0 papers with code

SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting

no code implementations11 Apr 2025 Jiaming Xu, Jiayi Pan, Yongkang Zhou, Siming Chen, Jinhao Li, Yaoxiu Lian, Junyi Wu, Guohao Dai

Early exiting has recently emerged as a promising technique for accelerating large language models (LLMs) by effectively reducing the hardware computation and memory access.

Language Modeling Language Modelling +3

Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective

no code implementations6 Oct 2024 Jinhao Li, Jiaming Xu, Shan Huang, Yonghua Chen, Wen Li, Jun Liu, Yaoxiu Lian, Jiayi Pan, Li Ding, Hao Zhou, Yu Wang, Guohao Dai

We compare the performance of the same optimization methods across different hardware platforms, the performance across different hardware platforms, and the performance of different methods on the same hardware platform.

Language Modeling Language Modelling +3

Fast and Efficient 2-bit LLM Inference on GPU: 2/4/16-bit in a Weight Matrix with Asynchronous Dequantization

no code implementations28 Nov 2023 Jinhao Li, Jiaming Xu, Shiyao Li, Shan Huang, Jun Liu, Yaoxiu Lian, Guohao Dai

To tackle these challenges and enable fast and efficient LLM inference on GPUs, we propose the following techniques in this paper.

Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.