Search Results for author: Daiyaan Arfeen

Found 7 papers, 3 papers with code

Nonuniform-Tensor-Parallelism: Mitigating GPU failure impact for Scaled-up LLM Training

no code implementations8 Apr 2025 Daiyaan Arfeen, Dheevatsa Mudigere, Ankit More, Bhargava Gopireddy, Ahmet Inci, Gregory R. Ganger

We also propose a rack-design with improved electrical and thermal capabilities in order to sustain power-boosting of scale-up domains that have experienced failures; combined with NTP, this can allow the DP replica with the reduced TP degree (i. e., with failed GPUs) to keep up with the others, thereby achieving near-zero throughput loss for large-scale LLM training.

PipeFill: Using GPUs During Bubbles in Pipeline-parallel LLM Training

no code implementations23 Sep 2024 Daiyaan Arfeen, Zhen Zhang, Xinwei Fu, Gregory R. Ganger, Yida Wang

Unfortunately, PP model training can use GPUs inefficiently, especially at large scale, due to idle GPU time caused by pipeline bubbles, which are often 15-30% and can exceed 60% of the training job's GPU allocation.

8k

GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

no code implementations24 Jun 2024 Byungsoo Jeon, Mengdi Wu, Shiyi Cao, Sunghyun Kim, Sunghyun Park, Neeraj Aggarwal, Colin Unger, Daiyaan Arfeen, Peiyuan Liao, Xupeng Miao, Mohammad Alizadeh, Gregory R. Ganger, Tianqi Chen, Zhihao Jia

GraphPipe partitions a DNN into a graph of stages, optimizes micro-batch schedules for these stages, and parallelizes DNN training using the discovered GPP strategies.

SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification

3 code implementations16 May 2023 Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xiaoxiang Shi, Chunan Shi, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar, Zhihao Jia

Our evaluation shows that SpecInfer outperforms existing LLM serving systems by 1. 5-2. 8x for distributed LLM inference and by 2. 6-3. 5x for offloading-based LLM inference, while preserving the same generative performance.

Decoder Language Modeling +2

Unsupervised Projection Networks for Generative Adversarial Networks

no code implementations30 Sep 2019 Daiyaan Arfeen, Jesse Zhang

We propose the use of unsupervised learning to train projection networks that project onto the latent space of an already trained generator.

Clustering Image Super-Resolution

Large batch size training of neural networks with adversarial training and second-order information

1 code implementation ICLR 2019 Zhewei Yao, Amir Gholami, Daiyaan Arfeen, Richard Liaw, Joseph Gonzalez, Kurt Keutzer, Michael Mahoney

Our method exceeds the performance of existing solutions in terms of both accuracy and the number of SGD iterations (up to 1\% and $5\times$, respectively).

Second-order methods

Cannot find the paper you are looking for? You can Submit a new open access paper.