Search Results for author: Matthew Lentz

Found 7 papers, 1 papers with code

HoneyBee: Efficient Role-based Access Control for Vector Databases via Dynamic Partitioning

1 code implementation2 May 2025 Hongbin Zhong, Matthew Lentz, Nina Narodytska, Adriana Szekeres, Kexin Rong

As vector databases gain traction in enterprise applications, robust access control has become critical to safeguard sensitive data.

HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs

no code implementations4 Apr 2025 Yongji Wu, Xueshen Liu, Shuowei Jin, Ceyu Xu, Feng Qian, Z. Morley Mao, Matthew Lentz, Danyang Zhuo, Ion Stoica

However, existing solutions are agnostic to the performance characteristics of different MoE model components (i. e., attention and expert) and do not fully utilize each GPU's compute capability.

Mixture-of-Experts

Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement

no code implementations5 Jul 2024 Yongji Wu, Wenjie Qu, Tianyang Tao, Zhuang Wang, Wei Bai, Zhuohao Li, Yuan Tian, Jiaheng Zhang, Matthew Lentz, Danyang Zhuo

The cost of even a single failure is significant, as all GPUs need to wait idle until the failure is resolved, potentially losing considerable training progress as training has to restart from checkpoints.

Mixture-of-Experts

VcLLM: Video Codecs are Secretly Tensor Codecs

no code implementations29 Jun 2024 Ceyu Xu, Yongji Wu, Xinyu Yang, Beidi Chen, Matthew Lentz, Danyang Zhuo, Lisa Wu Wills

As the parameter size of large language models (LLMs) continues to expand, the need for a large memory footprint and high communication bandwidth have become significant bottlenecks for the training and inference of LLMs.

Adaptive Skeleton Graph Decoding

no code implementations19 Feb 2024 Shuowei Jin, Yongji Wu, Haizhong Zheng, Qingzhao Zhang, Matthew Lentz, Z. Morley Mao, Atul Prakash, Feng Qian, Danyang Zhuo

Large language models (LLMs) have seen significant adoption for natural language tasks, owing their success to massive numbers of model parameters (e. g., 70B+); however, LLM inference incurs significant computation and memory costs.

Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures

no code implementations10 May 2022 Yongji Wu, Matthew Lentz, Danyang Zhuo, Yao Lu

With the advent of ubiquitous deployment of smart devices and the Internet of Things, data sources for machine learning inference have increasingly moved to the edge of the network.

AutoML BIG-bench Machine Learning +5

Cannot find the paper you are looking for? You can Submit a new open access paper.