Search Results for author: Aditya Akella

Found 11 papers, 2 papers with code

FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping

no code implementations • 5 Apr 2024 • Ajay Jaiswal, Bodun Hu, Lu Yin, Yeonju Ro, Shiwei Liu, Tianlong Chen, Aditya Akella

In this work, we observed the saturation of computationally expensive feed-forward blocks of LLM layers and proposed FFN-SkipLLM, which is a novel fine-grained skip strategy of autoregressive LLMs.

Attribute Hallucination +1

Paper
Add Code

Accelerating Distributed Deep Learning using Lossless Homomorphic Compression

1 code implementation • 12 Feb 2024 • Haoyu Li, Yuchen Xu, Jiayi Chen, Rohit Dwivedula, Wenfei Wu, Keqiang He, Aditya Akella, Daehyeok Kim

As deep neural networks (DNNs) grow in complexity and size, the resultant increase in communication overhead during distributed training has become a significant bottleneck, challenging the scalability of distributed training systems.

Computational Efficiency

Paper
Code

On a Foundation Model for Operating Systems

no code implementations • 13 Dec 2023 • Divyanshu Saxena, Nihal Sharma, Donghyun Kim, Rohit Dwivedula, Jiayi Chen, Chenxi Yang, Sriram Ravula, Zichao Hu, Aditya Akella, Sebastian Angel, Joydeep Biswas, Swarat Chaudhuri, Isil Dillig, Alex Dimakis, P. Brighten Godfrey, Daehyeok Kim, Chris Rossbach, Gang Wang

This paper lays down the research agenda for a domain-specific foundation model for operating systems (OSes).

Paper
Add Code

MOSEL: Inference Serving Using Dynamic Modality Selection

no code implementations • 27 Oct 2023 • Bodun Hu, Le Xu, Jeongyoon Moon, Neeraja J. Yadwadkar, Aditya Akella

Rapid advancements over the years have helped machine learning models reach previously hard-to-achieve goals, sometimes even exceeding human capabilities.

Paper
Add Code

CASSINI: Network-Aware Job Scheduling in Machine Learning Clusters

no code implementations • 1 Aug 2023 • Sudarsanan Rajasekaran, Manya Ghobadi, Aditya Akella

We present CASSINI, a network-aware job scheduler for machine learning (ML) clusters.

Scheduling

Paper
Add Code

Auxo: Efficient Federated Learning via Scalable Client Clustering

no code implementations • 29 Oct 2022 • Jiachen Liu, Fan Lai, Yinwei Dai, Aditya Akella, Harsha Madhyastha, Mosharaf Chowdhury

In this paper, we explore an additional layer of complexity to mitigate such heterogeneity by grouping clients with statistically similar data distributions (cohorts).

Clustering Federated Learning

Paper
Add Code

Impact of RoCE Congestion Control Policies on Distributed Training of DNNs

no code implementations • 22 Jul 2022 • Tarannum Khan, Saeed Rashidi, Srinivas Sridharan, Pallavi Shurpali, Aditya Akella, Tushar Krishna

Our results indicate that previously proposed RoCE congestion control schemes have little impact on the end-to-end performance of training workloads, motivating the necessity of designing an optimized, yet low-overhead, congestion control scheme based on the characteristics of distributed training platforms and workloads.

Blocking

Paper
Add Code

Multi-agent Databases via Independent Learning

no code implementations • 28 May 2022 • Chi Zhang, Olga Papaemmanouil, Josiah P. Hanna, Aditya Akella

Thus, the paper attempts to address the question "Is it possible to design a database consisting of various learned components that cooperatively work to improve end-to-end query latency?".

Multi-agent Reinforcement Learning Scheduling

Paper
Add Code

Doing More by Doing Less: How Structured Partial Backpropagation Improves Deep Learning Clusters

1 code implementation • 20 Nov 2021 • Adarsh Kumar, Kausik Subramanian, Shivaram Venkataraman, Aditya Akella

This simultaneously reduces network bandwidth, compute utilization, and memory footprint while preserving model quality.

Scheduling

Paper
Code

Accelerating Deep Learning Inference via Learned Caches

no code implementations • 18 Jan 2021 • Arjun Balasubramanian, Adarsh Kumar, YuHan Liu, Han Cao, Shivaram Venkataraman, Aditya Akella

We present the design of GATI, an end-to-end prediction serving system that incorporates learned caches for low-latency DNN inference.

Paper
Add Code

Accelerating Deep Learning Inference via Freezing

no code implementations • 7 Feb 2020 • Adarsh Kumar, Arjun Balasubramanian, Shivaram Venkataraman, Aditya Akella

In this work, we observe that caching intermediate layer outputs can help us avoid running all the layers of a DNN for a sizeable fraction of inference requests.

Quantization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.