Search Results for author: Winston Hu

Found 2 papers, 2 papers with code

Parallel Speculative Decoding with Adaptive Draft Length

1 code implementation13 Aug 2024 Tianyu Liu, Yun Li, Qitan Lv, Kai Liu, Jianchen Zhu, Winston Hu

Speculative decoding (SD), where an extra draft model is employed to provide multiple \textit{draft} tokens first and then the original target model verifies these tokens in parallel, has shown great power for LLM inference acceleration.

Text Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.