Search Results for author: Shuzhang Zhong

Found 2 papers, 0 papers with code

ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding

no code implementations21 Feb 2024 Shuzhang Zhong, Zebin Yang, Meng Li, Ruihao Gong, Runsheng Wang, Ru Huang

Additionally, it introduces a dynamic token tree generation algorithm to balance the computation and parallelism of the verification phase in real-time and maximize the overall efficiency across different batch sizes, sequence lengths, and tasks, etc.

Memory-aware Scheduling for Complex Wired Networks with Iterative Graph Optimization

no code implementations26 Aug 2023 Shuzhang Zhong, Meng Li, Yun Liang, Runsheng Wang, Ru Huang

Memory-aware network scheduling is becoming increasingly important for deep neural network (DNN) inference on resource-constrained devices.

Scheduling

Cannot find the paper you are looking for? You can Submit a new open access paper.