no code implementations • 16 Feb 2024 • Nikhil Bhendawade, Irina Belousova, Qichen Fu, Henry Mason, Mohammad Rastegari, Mahyar Najibi
Speculative decoding is a prominent technique to speed up the inference of a large target language model based on predictions of an auxiliary draft model.
1 code implementation • ACL 2021 • Yu Yan, Fei Hu, Jiusheng Chen, Nikhil Bhendawade, Ting Ye, Yeyun Gong, Nan Duan, Desheng Cui, Bingyu Chi, Ruofei Zhang
Transformer-based models have made tremendous impacts in natural language generation.
1 code implementation • 11 May 2021 • Yu Yan, Jiusheng Chen, Weizhen Qi, Nikhil Bhendawade, Yeyun Gong, Nan Duan, Ruofei Zhang
Transformer model with multi-head attention requires caching intermediate results for efficient inference in generation tasks.