Search Results for author: Xinhao Cheng

Found 4 papers, 3 papers with code

A Multi-Level Superoptimizer for Tensor Programs

1 code implementation9 May 2024 Mengdi Wu, Xinhao Cheng, Oded Padon, Zhihao Jia

We introduce Mirage, the first multi-level superoptimizer for tensor programs.


Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

no code implementations23 Dec 2023 Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, Tianqi Chen, Zhihao Jia

In the rapidly evolving landscape of artificial intelligence (AI), generative large language models (LLMs) stand at the forefront, revolutionizing how we interact with our data.

Language Modelling Large Language Model

SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification

3 code implementations16 May 2023 Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xiaoxiang Shi, Chunan Shi, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar, Zhihao Jia

Our evaluation shows that SpecInfer outperforms existing LLM serving systems by 1. 5-2. 8x for distributed LLM inference and by 2. 6-3. 5x for offloading-based LLM inference, while preserving the same generative performance.

Decoder Language Modelling +1

Cannot find the paper you are looking for? You can Submit a new open access paper.