Search Results for author: Abhinav Bhatele

Found 10 papers, 2 papers with code

Jorge: Approximate Preconditioning for GPU-efficient Second-order Optimization

no code implementations18 Oct 2023 Siddharth Singh, Zachary Sating, Abhinav Bhatele

The primary efficiency bottleneck in such optimizers is matrix inverse calculations in the preconditioning step, which are expensive to compute on GPUs.

Computational Efficiency Second-order methods

Modeling Parallel Programs using Large Language Models

no code implementations29 Jun 2023 Daniel Nichols, Aniruddha Marathe, Harshitha Menon, Todd Gamblin, Abhinav Bhatele

In this paper, we show how large language models (LLMs) can be applied to tasks specific to high performance and scientific codes.

Language Modelling

A 4D Hybrid Algorithm to Scale Parallel Training to Thousands of GPUs

no code implementations22 May 2023 Siddharth Singh, Prajwal Singhania, Aditya K. Ranjan, Zack Sating, Abhinav Bhatele

Large communication costs are a critical bottleneck in training state-of-the-art neural networks on distributed systems.

A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training

1 code implementation11 Mar 2023 Siddharth Singh, Olatunji Ruwase, Ammar Ahmad Awan, Samyam Rajbhandari, Yuxiong He, Abhinav Bhatele

Mixture-of-Experts (MoE) is a neural network architecture that adds sparsely activated expert blocks to a base model, increasing the number of parameters without impacting computational costs.

Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training

no code implementations10 Feb 2023 Siddharth Singh, Abhinav Bhatele

Parallel training of neural networks at scale is challenging due to significant overheads arising from communication.

A Survey and Empirical Evaluation of Parallel Deep Learning Frameworks

no code implementations9 Nov 2021 Daniel Nichols, Siddharth Singh, Shu-Huai Lin, Abhinav Bhatele

This phenomenon has spurred the development of algorithms for distributed training of neural networks over a larger number of hardware accelerators.

AxoNN: An asynchronous, message-driven parallel framework for extreme-scale deep learning

no code implementations25 Oct 2021 Siddharth Singh, Abhinav Bhatele

This has necessitated the development of efficient algorithms to train these neural networks in parallel on large-scale GPU-based clusters.

Analytics of Longitudinal System Monitoring Data for Performance Prediction

no code implementations7 Jul 2020 Ian J. Costello, Abhinav Bhatele

In recent years, several HPC facilities have started continuous monitoring of their systems and jobs to collect performance-related data for understanding performance and operational efficiency.

Scalable Comparative Visualization of Ensembles of Call Graphs

1 code implementation1 Jul 2020 Suraj P. Kesavan, Harsh Bhatia, Abhinav Bhatele, Todd Gamblin, Peer-Timo Bremer, Kwan-Liu Ma

Optimizing the performance of large-scale parallel codes is critical for efficient utilization of computing resources.

Distributed, Parallel, and Cluster Computing Performance

Cannot find the paper you are looking for? You can Submit a new open access paper.