Search Results for author: Leyang Xue

Found 4 papers, 1 papers with code

MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving

1 code implementation25 Jan 2024 Leyang Xue, Yao Fu, Zhan Lu, Luo Mai, Mahesh Marina

This paper presents MoE-Infinity, a cost-efficient mixture-of-expert (MoE) serving system that realizes activation-aware expert offloading.

ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models

no code implementations25 Jan 2024 Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, Yuvraj Patel, Luo Mai

This paper presents ServerlessLLM, a locality-enhanced serverless inference system for Large Language Models (LLMs).

Enhancing the long-term performance of recommender system

no code implementations1 Apr 2019 Leyang Xue, Peng Zhang, An Zeng

Notably, an optimal parameter n* of ARL existed in long-term recommendation, indicating that there is a trade-off between keeping diversity of item and user's preference to maximize the long-term recommendation accuracy.

Recommendation Systems

Predictability of diffusion-based recommender systems

no code implementations29 Mar 2019 Peng Zhang, Leyang Xue, An Zeng

The results show that the higher recommendation accuracy based on diffusion algorithms can still be achieved by optimizing the way of resource allocation on a density network.

Recommendation Systems

Cannot find the paper you are looking for? You can Submit a new open access paper.