1 code implementation • 11 Oct 2023 • YuHan Liu, Hanchen Li, Yihua Cheng, Siddhant Ray, YuYang Huang, Qizheng Zhang, Kuntai Du, Jiayi Yao, Shan Lu, Ganesh Ananthanarayanan, Michael Maire, Henry Hoffmann, Ari Holtzman, Junchen Jiang
Compared to the recent systems that reuse the KV cache, CacheGen reduces the KV cache size by 3. 7-4. 3x and the total delay in fetching and processing contexts by 2. 7-3. 2x while having negligible impact on the LLM response quality in accuracy or perplexity.
no code implementations • 3 Oct 2023 • Kuntai Du, YuHan Liu, Yitian Hao, Qizheng Zhang, Haodong Wang, YuYang Huang, Ganesh Ananthanarayanan, Junchen Jiang
While the high demand for network bandwidth and GPU resources could be substantially reduced by optimally adapting the configuration knobs, such as video resolution and frame rate, current adaptation techniques fail to meet three requirements simultaneously: adapt configurations (i) with minimum extra GPU or bandwidth overhead; (ii) to reach near-optimal decisions based on how the data affects the final DNN's accuracy, and (iii) do so for a range of configuration knobs.
no code implementations • 19 Jan 2022 • Arthi Padmanabhan, Neil Agarwal, Anand Iyer, Ganesh Ananthanarayanan, Yuanchao Shu, Nikolaos Karianakis, Guoqing Harry Xu, Ravi Netravali
Video analytics pipelines have steadily shifted to edge deployments to reduce bandwidth overheads and privacy violations, but in doing so, face an ever-growing resource tension.
no code implementations • 19 Dec 2020 • Romil Bhardwaj, Zhengxu Xia, Ganesh Ananthanarayanan, Junchen Jiang, Nikolaos Karianakis, Yuanchao Shu, Kevin Hsieh, Victor Bahl, Ion Stoica
Compressed models that are deployed on the edge servers for inference suffer from data drift, where the live video data diverges from the training data.
no code implementations • 17 Jun 2020 • Rishabh Poddar, Ganesh Ananthanarayanan, Srinath Setty, Stavros Volos, Raluca Ada Popa
Video-analytics-as-a-service is becoming an important offering for cloud providers.
1 code implementation • 31 Jul 2019 • M. G. Sarwar Murshed, Christopher Murphy, Daqing Hou, Nazar Khan, Ganesh Ananthanarayanan, Faraz Hussain
To address this issue, efforts have been made to place additional computing devices at the edge of the network, i. e close to the IoT devices where the data is generated.
no code implementations • 5 Jun 2019 • Krishna Narra, Zhifeng Lin, Ganesh Ananthanarayanan, Salman Avestimehr, Murali Annavaram
In this work, we argue that MLaaS platforms also provide unique opportunities to cut the cost of redundancy.
no code implementations • 27 Apr 2019 • Krishna Giri Narra, Zhifeng Lin, Ganesh Ananthanarayanan, Salman Avestimehr, Murali Annavaram
Deploying the collage-cnn models in the cloud, we demonstrate that the 99th percentile tail latency of inference can be reduced by 1. 2x to 2x compared to replication based approaches while providing high accuracy.
1 code implementation • 3 Nov 2018 • Samvit Jain, Xun Zhang, Yuhao Zhou, Ganesh Ananthanarayanan, Junchen Jiang, Yuanchao Shu, Joseph Gonzalez
Enterprises are increasingly deploying large camera networks for video analytics.
no code implementations • 7 Sep 2018 • Samvit Jain, Ganesh Ananthanarayanan, Junchen Jiang, Yuanchao Shu, Joseph E. Gonzalez
Driven by advances in computer vision and the falling costs of camera hardware, organizations are deploying video cameras en masse for the spatial monitoring of their physical premises.
no code implementations • 10 Jan 2018 • Kevin Hsieh, Ganesh Ananthanarayanan, Peter Bodik, Paramvir Bahl, Matthai Philipose, Phillip B. Gibbons, Onur Mutlu
Focus handles the lower accuracy of the cheap CNNs by judiciously leveraging expensive CNNs at query-time.