Search Results for author: Hsien-Hsin S. Lee

Found 10 papers, 3 papers with code

Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference

no code implementations10 Mar 2023 Haiyang Huang, Newsha Ardalani, Anna Sun, Liu Ke, Hsien-Hsin S. Lee, Anjali Sridhar, Shruti Bhosale, Carole-Jean Wu, Benjamin Lee

We propose three optimization techniques to mitigate sources of inefficiencies, namely (1) Dynamic gating, (2) Expert Buffering, and (3) Expert load balancing.

Language Modelling Machine Translation

GPU-based Private Information Retrieval for On-Device Machine Learning Inference

1 code implementation26 Jan 2023 Maximilian Lam, Jeff Johnson, Wenjie Xiong, Kiwan Maeng, Udit Gupta, Yang Li, Liangzhen Lai, Ilias Leontiadis, Minsoo Rhu, Hsien-Hsin S. Lee, Vijay Janapa Reddi, Gu-Yeon Wei, David Brooks, G. Edward Suh

Together, for various on-device ML applications such as recommendation and language modeling, our system on a single V100 GPU can serve up to $100, 000$ queries per second -- a $>100 \times$ throughput improvement over a CPU-based baseline -- while maintaining model accuracy.

Information Retrieval Language Modelling +1

Data Leakage via Access Patterns of Sparse Features in Deep Learning-based Recommendation Systems

no code implementations12 Dec 2022 Hanieh Hashemi, Wenjie Xiong, Liu Ke, Kiwan Maeng, Murali Annavaram, G. Edward Suh, Hsien-Hsin S. Lee

This paper explores the private information that may be learned by tracking a recommendation model's sparse feature access patterns.

Recommendation Systems

Memory-Oriented Design-Space Exploration of Edge-AI Hardware for XR Applications

no code implementations8 Jun 2022 Vivek Parmar, Syed Shakib Sarwar, Ziyun Li, Hsien-Hsin S. Lee, Barbara De Salvo, Manan Suri

Low-Power Edge-AI capabilities are essential for on-device extended reality (XR) applications to support the vision of Metaverse.

Hand Detection Quantization

DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference

no code implementations8 Jan 2020 Udit Gupta, Samuel Hsia, Vikram Saraph, Xiaodong Wang, Brandon Reagen, Gu-Yeon Wei, Hsien-Hsin S. Lee, David Brooks, Carole-Jean Wu

Neural personalized recommendation is the corner-stone of a wide collection of cloud services and products, constituting significant compute demand of the cloud infrastructure.

Distributed, Parallel, and Cluster Computing

Cannot find the paper you are looking for? You can Submit a new open access paper.