Search Results for author: Mu Yuan

Found 5 papers, 4 papers with code

Secure Transformer Inference

1 code implementation • 14 Nov 2023 • Mu Yuan, Lan Zhang, Xiang-Yang Li

Our protocol, Secure Transformer Inference Protocol (STIP), can be applied to real-world services like ChatGPT.

109

Paper
Code

PacketGame: Multi-Stream Packet Gating for Concurrent Video Inference at Scale

1 code implementation • journal 2023 • Mu Yuan, Lan Zhang, Xuanke You, Xiang-Yang Li

The resource efficiency of video analytics workloads is critical for large-scale deployments on edge nodes and cloud clusters.

Video Compression

Paper
Code

MLink: Linking Black-Box Models from Multiple Domains for Collaborative Inference

3 code implementations • 28 Sep 2022 • Mu Yuan, Lan Zhang, Zimu Zheng, Yi-Nan Zhang, Xiang-Yang Li

The cost efficiency of model inference is critical to real-world machine learning (ML) applications, especially for delay-sensitive tasks and resource-limited devices.

Collaborative Inference Multi-Task Learning +1

Paper
Code

InFi: End-to-End Learning to Filter Input for Resource-Efficiency in Mobile-Centric Inference

3 code implementations • 28 Sep 2022 • Mu Yuan, Lan Zhang, Fengxiang He, Xueting Tong, Miao-Hui Song, Zhengyuan Xu, Xiang-Yang Li

Previous efforts have tailored effective solutions for many applications, but left two essential questions unanswered: (1) theoretical filterability of an inference workload to guide the application of input filtering techniques, thereby avoiding the trial-and-error cost for resource-constrained mobile applications; (2) robust discriminability of feature embedding to allow input filtering to be widely effective for diverse inference tasks and input content.

Paper
Code

Comprehensive and Efficient Data Labeling via Adaptive Model Scheduling

no code implementations • 8 Feb 2020 • Mu Yuan, Lan Zhang, Xiang-Yang Li, Hui Xiong

With limited computing resources and stringent delay, given a data stream and a collection of applicable resource-hungry deep-learning models, we design a novel approach to adaptively schedule a subset of these models to execute on each data item, aiming to maximize the value of the model output (e. g., the number of high-confidence labels).

Image Retrieval Management +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.