Search Results for author: Nikhil Bhendawade

Found 3 papers, 2 papers with code

Speculative Streaming: Fast LLM Inference without Auxiliary Models

no code implementations16 Feb 2024 Nikhil Bhendawade, Irina Belousova, Qichen Fu, Henry Mason, Mohammad Rastegari, Mahyar Najibi

Speculative decoding is a prominent technique to speed up the inference of a large target language model based on predictions of an auxiliary draft model.

Language Modelling

EL-Attention: Memory Efficient Lossless Attention for Generation

1 code implementation11 May 2021 Yu Yan, Jiusheng Chen, Weizhen Qi, Nikhil Bhendawade, Yeyun Gong, Nan Duan, Ruofei Zhang

Transformer model with multi-head attention requires caching intermediate results for efficient inference in generation tasks.

Question Generation Question-Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.