Search Results for author: Huanqi Cao

Found 4 papers, 2 papers with code

SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention

no code implementations17 Jun 2024 Qianchao Zhu, Jiangfei Duan, Chang Chen, Siran Liu, Xiuhong Li, Guanyu Feng, Xin Lv, Huanqi Cao, Xiao Chuanfu, Xingcheng Zhang, Dahua Lin, Chao Yang

Large language models (LLMs) now support extremely long context windows, but the quadratic complexity of vanilla attention results in significantly long Time-to-First-Token (TTFT) latency.

PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR

no code implementations11 Jul 2023 Zixuan Ma, Haojie Wang, Jingze Xing, Liyan Zheng, Chen Zhang, Huanqi Cao, Kezhao Huang, Shizhi Tang, Penghan Wang, Jidong Zhai

To accelerate DNN computation, tensor compilers are proposed to generate efficient code on different domain-specific accelerators.

Cannot find the paper you are looking for? You can Submit a new open access paper.