Search Results for author: Yongji Wu

Found 11 papers, 3 papers with code

Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement

no code implementations5 Jul 2024 Yongji Wu, Wenjie Qu, Tianyang Tao, Zhuang Wang, Wei Bai, Zhuohao Li, Yuan Tian, Jiaheng Zhang, Matthew Lentz, Danyang Zhuo

The cost of even a single failure is significant, as all GPUs need to wait idle until the failure is resolved, potentially losing considerable training progress as training has to restart from checkpoints.

VcLLM: Video Codecs are Secretly Tensor Codecs

no code implementations29 Jun 2024 Ceyu Xu, Yongji Wu, Xinyu Yang, Beidi Chen, Matthew Lentz, Danyang Zhuo, Lisa Wu Wills

As the parameter size of large language models (LLMs) continues to expand, the need for a large memory footprint and high communication bandwidth have become significant bottlenecks for the training and inference of LLMs.

Adaptive Skeleton Graph Decoding

no code implementations19 Feb 2024 Shuowei Jin, Yongji Wu, Haizhong Zheng, Qingzhao Zhang, Matthew Lentz, Z. Morley Mao, Atul Prakash, Feng Qian, Danyang Zhuo

Large language models (LLMs) have seen significant adoption for natural language tasks, owing their success to massive numbers of model parameters (e. g., 70B+); however, LLM inference incurs significant computation and memory costs.

Curator: Efficient Indexing for Multi-Tenant Vector Databases

no code implementations13 Jan 2024 Yicheng Jin, Yongji Wu, WenJun Hu, Bruce M. Maggs, Xiao Zhang, Danyang Zhuo

Vector databases have emerged as key enablers for bridging intelligent applications with unstructured data, providing generic search and management support for embedding vectors extracted from the raw unstructured data.

Clustering

AR Visualization System for Ship Detection and Recognition Based on AI

no code implementations21 Nov 2023 Ziqi Ye, Limin Huang, Yongji Wu, Min Hu

The combination of artificial intelligence and augmented reality technology has also become a future development trend.

Information Retrieval object-detection +3

Punica: Multi-Tenant LoRA Serving

1 code implementation28 Oct 2023 Lequn Chen, Zihao Ye, Yongji Wu, Danyang Zhuo, Luis Ceze, Arvind Krishnamurthy

Our scheduler consolidates multi-tenant LoRA serving workloads in a shared GPU cluster.

Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures

no code implementations10 May 2022 Yongji Wu, Matthew Lentz, Danyang Zhuo, Yao Lu

With the advent of ubiquitous deployment of smart devices and the Internet of Things, data sources for machine learning inference have increasingly moved to the edge of the network.

AutoML BIG-bench Machine Learning +5

How Powerful is Graph Convolution for Recommendation?

1 code implementation17 Aug 2021 Yifei Shen, Yongji Wu, Yao Zhang, Caihua Shan, Jun Zhang, Khaled B. Letaief, Dongsheng Li

In this paper, we endeavor to obtain a better understanding of GCN-based CF methods via the lens of graph signal processing.

Collaborative Filtering

Linear-Time Self Attention with Codeword Histogram for Efficient Recommendation

1 code implementation28 May 2021 Yongji Wu, Defu Lian, Neil Zhenqiang Gong, Lu Yin, Mingyang Yin, Jingren Zhou, Hongxia Yang

Inspired by the idea of vector quantization that uses cluster centroids to approximate items, we propose LISA (LInear-time Self Attention), which enjoys both the effectiveness of vanilla self-attention and the efficiency of sparse attention.

Quantization Sequential Recommendation

Rethinking Lifelong Sequential Recommendation with Incremental Multi-Interest Attention

no code implementations28 May 2021 Yongji Wu, Lu Yin, Defu Lian, Mingyang Yin, Neil Zhenqiang Gong, Jingren Zhou, Hongxia Yang

With the rapid development of these services in the last two decades, users have accumulated a massive amount of behavior data.

Sequential Recommendation

Cannot find the paper you are looking for? You can Submit a new open access paper.