Search Results for author: Jinwei Yao

Found 1 papers, 0 papers with code

DeFT: Flash Tree-attention with IO-Awareness for Efficient Tree-search-based LLM Inference

no code implementations30 Mar 2024 Jinwei Yao, Kaiqi Chen, Kexun Zhang, Jiaxuan You, Binhang Yuan, Zeke Wang, Tao Lin

Decoding using tree search can greatly enhance the inference quality for transformer-based Large Language Models (LLMs).

Cannot find the paper you are looking for? You can Submit a new open access paper.