Search Results for author: Anurag Khandelwal

Found 2 papers, 0 papers with code

Prompt Cache: Modular Attention Reuse for Low-Latency Inference

no code implementations7 Nov 2023 In Gim, Guojun Chen, Seung-seob Lee, Nikhil Sarda, Anurag Khandelwal, Lin Zhong

We present Prompt Cache, an approach for accelerating inference for large language models (LLM) by reusing attention states across different LLM prompts.

Question Answering

Cloud Programming Simplified: A Berkeley View on Serverless Computing

no code implementations9 Feb 2019 Eric Jonas, Johann Schleier-Smith, Vikram Sreekanti, Chia-Che Tsai, Anurag Khandelwal, Qifan Pu, Vaishaal Shankar, Joao Carreira, Karl Krauth, Neeraja Yadwadkar, Joseph E. Gonzalez, Raluca Ada Popa, Ion Stoica, David A. Patterson

Serverless cloud computing handles virtually all the system administration operations needed to make it easier for programmers to use the cloud.

Operating Systems

Cannot find the paper you are looking for? You can Submit a new open access paper.