Search Results for author: Nikhil Sarda

Found 4 papers, 0 papers with code

Prompt Cache: Modular Attention Reuse for Low-Latency Inference

no code implementations7 Nov 2023 In Gim, Guojun Chen, Seung-seob Lee, Nikhil Sarda, Anurag Khandelwal, Lin Zhong

We present Prompt Cache, an approach for accelerating inference for large language models (LLM) by reusing attention states across different LLM prompts.

Question Answering

SmartChoices: Hybridizing Programming and Machine Learning

no code implementations ICLR 2019 Victor Carbune, Thierry Coppey, Alexander Daryin, Thomas Deselaers, Nikhil Sarda, Jay Yagnik

As opposed to previous work applying ML to algorithmic problems, our proposed approach does not require to drop existing implementations but seamlessly integrates into the standard software development workflow and gives full control to the software developer over how ML methods are applied.

BIG-bench Machine Learning Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.