1 code implementation • 14 Nov 2023 • Mu Yuan, Lan Zhang, Xiang-Yang Li
Our protocol, Secure Transformer Inference Protocol (STIP), can be applied to real-world services like ChatGPT.
1 code implementation • journal 2023 • Mu Yuan, Lan Zhang, Xuanke You, Xiang-Yang Li
The resource efficiency of video analytics workloads is critical for large-scale deployments on edge nodes and cloud clusters.
3 code implementations • 28 Sep 2022 • Mu Yuan, Lan Zhang, Zimu Zheng, Yi-Nan Zhang, Xiang-Yang Li
The cost efficiency of model inference is critical to real-world machine learning (ML) applications, especially for delay-sensitive tasks and resource-limited devices.
3 code implementations • 28 Sep 2022 • Mu Yuan, Lan Zhang, Fengxiang He, Xueting Tong, Miao-Hui Song, Zhengyuan Xu, Xiang-Yang Li
Previous efforts have tailored effective solutions for many applications, but left two essential questions unanswered: (1) theoretical filterability of an inference workload to guide the application of input filtering techniques, thereby avoiding the trial-and-error cost for resource-constrained mobile applications; (2) robust discriminability of feature embedding to allow input filtering to be widely effective for diverse inference tasks and input content.
no code implementations • 8 Feb 2020 • Mu Yuan, Lan Zhang, Xiang-Yang Li, Hui Xiong
With limited computing resources and stringent delay, given a data stream and a collection of applicable resource-hungry deep-learning models, we design a novel approach to adaptively schedule a subset of these models to execute on each data item, aiming to maximize the value of the model output (e. g., the number of high-confidence labels).