Search Results for author: Arjun Balasubramanian

Found 2 papers, 0 papers with code

Accelerating Deep Learning Inference via Learned Caches

no code implementations18 Jan 2021 Arjun Balasubramanian, Adarsh Kumar, YuHan Liu, Han Cao, Shivaram Venkataraman, Aditya Akella

We present the design of GATI, an end-to-end prediction serving system that incorporates learned caches for low-latency DNN inference.

Accelerating Deep Learning Inference via Freezing

no code implementations7 Feb 2020 Adarsh Kumar, Arjun Balasubramanian, Shivaram Venkataraman, Aditya Akella

In this work, we observe that caching intermediate layer outputs can help us avoid running all the layers of a DNN for a sizeable fraction of inference requests.

Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.