1 code implementation • 22 Jul 2024 • Jiale Xu, Rui Zhang, Cong Guo, Weiming Hu, Zihan Liu, Feiyang Wu, Yu Feng, Shixuan Sun, Changxu Shao, Yuhong Guo, Junping Zhao, Ke Zhang, Minyi Guo, Jingwen Leng
This study introduces the vTensor, an innovative tensor structure for LLM inference based on GPU virtual memory management (VMM).