Search Results for author: Aditya Akella

Found 11 papers, 2 papers with code

FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping

no code implementations5 Apr 2024 Ajay Jaiswal, Bodun Hu, Lu Yin, Yeonju Ro, Shiwei Liu, Tianlong Chen, Aditya Akella

In this work, we observed the saturation of computationally expensive feed-forward blocks of LLM layers and proposed FFN-SkipLLM, which is a novel fine-grained skip strategy of autoregressive LLMs.

Attribute Hallucination +1

Accelerating Distributed Deep Learning using Lossless Homomorphic Compression

1 code implementation12 Feb 2024 Haoyu Li, Yuchen Xu, Jiayi Chen, Rohit Dwivedula, Wenfei Wu, Keqiang He, Aditya Akella, Daehyeok Kim

As deep neural networks (DNNs) grow in complexity and size, the resultant increase in communication overhead during distributed training has become a significant bottleneck, challenging the scalability of distributed training systems.

Computational Efficiency

MOSEL: Inference Serving Using Dynamic Modality Selection

no code implementations27 Oct 2023 Bodun Hu, Le Xu, Jeongyoon Moon, Neeraja J. Yadwadkar, Aditya Akella

Rapid advancements over the years have helped machine learning models reach previously hard-to-achieve goals, sometimes even exceeding human capabilities.

CASSINI: Network-Aware Job Scheduling in Machine Learning Clusters

no code implementations1 Aug 2023 Sudarsanan Rajasekaran, Manya Ghobadi, Aditya Akella

We present CASSINI, a network-aware job scheduler for machine learning (ML) clusters.

Scheduling

Auxo: Efficient Federated Learning via Scalable Client Clustering

no code implementations29 Oct 2022 Jiachen Liu, Fan Lai, Yinwei Dai, Aditya Akella, Harsha Madhyastha, Mosharaf Chowdhury

In this paper, we explore an additional layer of complexity to mitigate such heterogeneity by grouping clients with statistically similar data distributions (cohorts).

Clustering Federated Learning

Impact of RoCE Congestion Control Policies on Distributed Training of DNNs

no code implementations22 Jul 2022 Tarannum Khan, Saeed Rashidi, Srinivas Sridharan, Pallavi Shurpali, Aditya Akella, Tushar Krishna

Our results indicate that previously proposed RoCE congestion control schemes have little impact on the end-to-end performance of training workloads, motivating the necessity of designing an optimized, yet low-overhead, congestion control scheme based on the characteristics of distributed training platforms and workloads.

Blocking

Multi-agent Databases via Independent Learning

no code implementations28 May 2022 Chi Zhang, Olga Papaemmanouil, Josiah P. Hanna, Aditya Akella

Thus, the paper attempts to address the question "Is it possible to design a database consisting of various learned components that cooperatively work to improve end-to-end query latency?".

Multi-agent Reinforcement Learning Scheduling

Doing More by Doing Less: How Structured Partial Backpropagation Improves Deep Learning Clusters

1 code implementation20 Nov 2021 Adarsh Kumar, Kausik Subramanian, Shivaram Venkataraman, Aditya Akella

This simultaneously reduces network bandwidth, compute utilization, and memory footprint while preserving model quality.

Scheduling

Accelerating Deep Learning Inference via Learned Caches

no code implementations18 Jan 2021 Arjun Balasubramanian, Adarsh Kumar, YuHan Liu, Han Cao, Shivaram Venkataraman, Aditya Akella

We present the design of GATI, an end-to-end prediction serving system that incorporates learned caches for low-latency DNN inference.

Accelerating Deep Learning Inference via Freezing

no code implementations7 Feb 2020 Adarsh Kumar, Arjun Balasubramanian, Shivaram Venkataraman, Aditya Akella

In this work, we observe that caching intermediate layer outputs can help us avoid running all the layers of a DNN for a sizeable fraction of inference requests.

Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.