no code implementations • 12 Feb 2025 • Siddharth Singh, Prajwal Singhania, Aditya Ranjan, John Kirchenbauer, Jonas Geiping, Yuxin Wen, Neel Jain, Abhimanyu Hans, Manli Shu, Aditya Tomar, Tom Goldstein, Abhinav Bhatele
Training and fine-tuning large language models (LLMs) with hundreds of billions to trillions of parameters requires tens of thousands of GPUs, and a highly scalable software stack.
1 code implementation • 7 Feb 2025 • Sean McLeish, John Kirchenbauer, David Yu Miller, Siddharth Singh, Abhinav Bhatele, Micah Goldblum, Ashwinee Panda, Tom Goldstein
Scaling laws are typically fit using a family of models with a narrow range of frozen hyper-parameter choices.
2 code implementations • 7 Feb 2025 • Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Tom Goldstein
We scale a proof-of-concept model to 3. 5 billion parameters and 800 billion tokens.
no code implementations • 19 Dec 2024 • Aman Chaturvedi, Daniel Nichols, Siddharth Singh, Abhinav Bhatele
Large Language Model (LLM) based coding tools have been tremendously successful as software development assistants, yet they are often designed for general purpose programming tasks and perform poorly for more specialized domains such as high performance computing.
1 code implementation • 14 Jun 2024 • Abhimanyu Hans, Yuxin Wen, Neel Jain, John Kirchenbauer, Hamid Kazemi, Prajwal Singhania, Siddharth Singh, Gowthami Somepalli, Jonas Geiping, Abhinav Bhatele, Tom Goldstein
Large language models can memorize and repeat their training data, causing privacy and copyright risks.
no code implementations • 14 Jun 2024 • Vasu Singla, Kaiyu Yue, Sukriti Paul, Reza Shirkavand, Mayuka Jayawardhana, Alireza Ganjdanesh, Heng Huang, Abhinav Bhatele, Gowthami Somepalli, Tom Goldstein
Training large vision-language models requires extensive, high-quality image-text pairs.
1 code implementation • 4 Jun 2024 • Prajwal Singhania, Siddharth Singh, Shwai He, Soheil Feizi, Abhinav Bhatele
Inference on large language models (LLMs) can be expensive in terms of the compute and memory costs involved, especially when long sequence lengths are used.
1 code implementation • 27 May 2024 • Sean McLeish, Arpit Bansal, Alex Stein, Neel Jain, John Kirchenbauer, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Jonas Geiping, Avi Schwarzschild, Tom Goldstein
The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits.
no code implementations • 29 Apr 2024 • Daniel Nichols, Pranav Polasam, Harshitha Menon, Aniruddha Marathe, Todd Gamblin, Abhinav Bhatele
Optimizing scientific software is a difficult task because codebases are often large and complex, and performance can depend upon several factors including the algorithm, its implementation, and hardware among others.
1 code implementation • 23 Jan 2024 • Daniel Nichols, Joshua H. Davis, Zhaojun Xie, Arjun Rajaram, Abhinav Bhatele
Large language models are increasingly becoming a popular tool for software development.
no code implementations • 18 Oct 2023 • Siddharth Singh, Zachary Sating, Abhinav Bhatele
The primary efficiency bottleneck in such optimizers is matrix inverse calculations in the preconditioning step, which are expensive to compute on GPUs.
no code implementations • 29 Jun 2023 • Daniel Nichols, Aniruddha Marathe, Harshitha Menon, Todd Gamblin, Abhinav Bhatele
In this paper, we show how large language models (LLMs) can be applied to tasks specific to high performance and scientific codes.
1 code implementation • 22 May 2023 • Siddharth Singh, Prajwal Singhania, Aditya K. Ranjan, Zack Sating, Abhinav Bhatele
This 4D approach is a hybrid of 3D tensor and data parallelism, and is implemented in the AxoNN framework.
1 code implementation • 11 Mar 2023 • Siddharth Singh, Olatunji Ruwase, Ammar Ahmad Awan, Samyam Rajbhandari, Yuxiong He, Abhinav Bhatele
Mixture-of-Experts (MoE) is a neural network architecture that adds sparsely activated expert blocks to a base model, increasing the number of parameters without impacting computational costs.
no code implementations • 10 Feb 2023 • Siddharth Singh, Abhinav Bhatele
Parallel training of neural networks at scale is challenging due to significant overheads arising from communication.
no code implementations • 9 Nov 2021 • Daniel Nichols, Siddharth Singh, Shu-Huai Lin, Abhinav Bhatele
This phenomenon has spurred the development of algorithms for distributed training of neural networks over a larger number of hardware accelerators.
no code implementations • 25 Oct 2021 • Siddharth Singh, Abhinav Bhatele
This has necessitated the development of efficient algorithms to train these neural networks in parallel on large-scale GPU-based clusters.
no code implementations • 7 Jul 2020 • Ian J. Costello, Abhinav Bhatele
In recent years, several HPC facilities have started continuous monitoring of their systems and jobs to collect performance-related data for understanding performance and operational efficiency.
1 code implementation • 1 Jul 2020 • Suraj P. Kesavan, Harsh Bhatia, Abhinav Bhatele, Todd Gamblin, Peer-Timo Bremer, Kwan-Liu Ma
Optimizing the performance of large-scale parallel codes is critical for efficient utilization of computing resources.
Distributed, Parallel, and Cluster Computing Performance