Search Results for author: Ganesh Ananthanarayanan

Found 14 papers, 3 papers with code

RAGServe: Fast Quality-Aware RAG Systems with Configuration Adaptation

no code implementations13 Dec 2024 Siddhant Ray, Rui Pan, Zhuohan Gu, Kuntai Du, Ganesh Ananthanarayanan, Ravi Netravali, Junchen Jiang

RAG (Retrieval Augmented Generation) allows LLMs (large language models) to generate better responses with external knowledge, but using more external knowledge often improves generation quality at the expense of response delay.

RAG Retrieval-augmented Generation +1

Distributed AI Platform for the 6G RAN

no code implementations1 Oct 2024 Ganesh Ananthanarayanan, Xenofon Foukas, Bozidar Radunovic, Yongguang Zhang

Cellular Radio Access Networks (RANs) are rapidly evolving towards 6G, driven by the need to reduce costs and introduce new revenue streams for operators and enterprises.

Management

EdgeSight: Enabling Modeless and Cost-Efficient Inference at the Edge

no code implementations29 May 2024 ChonLam Lao, Jiaqi Gao, Ganesh Ananthanarayanan, Aditya Akella, Minlan Yu

Traditional ML inference is evolving toward modeless inference, which abstracts the complexity of model selection from users, allowing the system to automatically choose the most appropriate model for each request based on accuracy and resource requirements.

Model Selection

CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving

2 code implementations11 Oct 2023 YuHan Liu, Hanchen Li, Yihua Cheng, Siddhant Ray, YuYang Huang, Qizheng Zhang, Kuntai Du, Jiayi Yao, Shan Lu, Ganesh Ananthanarayanan, Michael Maire, Henry Hoffmann, Ari Holtzman, Junchen Jiang

Compared to the recent systems that reuse the KV cache, CacheGen reduces the KV cache size by 3. 5-4. 3x and the total delay in fetching and processing contexts by 3. 2-3. 7x with negligible impact on the LLM response quality.

Language Modeling Language Modelling +2

OneAdapt: Fast Configuration Adaptation for Video Analytics Applications via Backpropagation

no code implementations3 Oct 2023 Kuntai Du, YuHan Liu, Yitian Hao, Qizheng Zhang, Haodong Wang, YuYang Huang, Ganesh Ananthanarayanan, Junchen Jiang

While the high demand for network bandwidth and GPU resources could be substantially reduced by optimally adapting the configuration knobs, such as video resolution and frame rate, current adaptation techniques fail to meet three requirements simultaneously: adapt configurations (i) with minimum extra GPU or bandwidth overhead; (ii) to reach near-optimal decisions based on how the data affects the final DNN's accuracy, and (iii) do so for a range of configuration knobs.

Deep Learning object-detection +1

GEMEL: Model Merging for Memory-Efficient, Real-Time Video Analytics at the Edge

no code implementations19 Jan 2022 Arthi Padmanabhan, Neil Agarwal, Anand Iyer, Ganesh Ananthanarayanan, Yuanchao Shu, Nikolaos Karianakis, Guoqing Harry Xu, Ravi Netravali

Video analytics pipelines have steadily shifted to edge deployments to reduce bandwidth overheads and privacy violations, but in doing so, face an ever-growing resource tension.

Management

Ekya: Continuous Learning of Video Analytics Models on Edge Compute Servers

no code implementations19 Dec 2020 Romil Bhardwaj, Zhengxu Xia, Ganesh Ananthanarayanan, Junchen Jiang, Nikolaos Karianakis, Yuanchao Shu, Kevin Hsieh, Victor Bahl, Ion Stoica

Compressed models that are deployed on the edge servers for inference suffer from data drift, where the live video data diverges from the training data.

Machine Learning at the Network Edge: A Survey

1 code implementation31 Jul 2019 M. G. Sarwar Murshed, Christopher Murphy, Daqing Hou, Nazar Khan, Ganesh Ananthanarayanan, Faraz Hussain

To address this issue, efforts have been made to place additional computing devices at the edge of the network, i. e close to the IoT devices where the data is generated.

BIG-bench Machine Learning Edge-computing +1

Collage Inference: Using Coded Redundancy for Low Variance Distributed Image Classification

no code implementations27 Apr 2019 Krishna Giri Narra, Zhifeng Lin, Ganesh Ananthanarayanan, Salman Avestimehr, Murali Annavaram

Deploying the collage-cnn models in the cloud, we demonstrate that the 99th percentile tail latency of inference can be reduced by 1. 2x to 2x compared to replication based approaches while providing high accuracy.

Classification Cloud Computing +4

Scaling Video Analytics Systems to Large Camera Deployments

no code implementations7 Sep 2018 Samvit Jain, Ganesh Ananthanarayanan, Junchen Jiang, Yuanchao Shu, Joseph E. Gonzalez

Driven by advances in computer vision and the falling costs of camera hardware, organizations are deploying video cameras en masse for the spatial monitoring of their physical premises.

Focus: Querying Large Video Datasets with Low Latency and Low Cost

no code implementations10 Jan 2018 Kevin Hsieh, Ganesh Ananthanarayanan, Peter Bodik, Paramvir Bahl, Matthai Philipose, Phillip B. Gibbons, Onur Mutlu

Focus handles the lower accuracy of the cheap CNNs by judiciously leveraging expensive CNNs at query-time.

Cannot find the paper you are looking for? You can Submit a new open access paper.