1 code implementation • 29 Feb 2024 • Xupeng Miao, Gabriele Oliaro, Xinhao Cheng, Mengdi Wu, Colin Unger, Zhihao Jia
This is because existing systems cannot handle workloads that include a mix of inference and PEFT finetuning requests.
1 code implementation • 30 Dec 2021 • Runpei Dong, Zhanhong Tan, Mengdi Wu, Linfeng Zhang, Kaisheng Ma
Besides, an efficient deployment flow for the mobile CPU is developed, achieving up to 7. 46$\times$ inference acceleration on an octa-core ARM CPU.