Search Results for author: Chih-Chieh Yang

Found 5 papers, 0 papers with code

TP-Aware Dequantization

no code implementations15 Jan 2024 Adnan Hoque, Mudhakar Srivatsa, Chih-Chieh Yang, Raghu Ganti

In this paper, we present a novel method that reduces model inference latency during distributed deployment of Large Language Models (LLMs).

Quantization

Accelerating a Triton Fused Kernel for W4A16 Quantized Inference with SplitK work decomposition

no code implementations5 Jan 2024 Adnan Hoque, Less Wright, Chih-Chieh Yang, Mudhakar Srivatsa, Raghu Ganti

Our implementation shows improvement for the type of skinny matrix-matrix multiplications found in foundation model inference workloads.

AI-aided multiscale modeling of physiologically-significant blood clots

no code implementations25 May 2022 Yicong Zhu, Changnian Han, Peng Zhang, Guojing Cong, James R. Kozloski, Chih-Chieh Yang, Leili Zhang, Yuefan Deng

We have developed an AI-aided multiple time stepping (AI-MTS) algorithm and multiscale modeling framework (AI-MSM) and implemented them on the Summit-like supercomputer, AIMOS.

Accelerating Data Loading in Deep Neural Network Training

no code implementations2 Oct 2019 Chih-Chieh Yang, Guojing Cong

Our model suggests that I/O rate limits the scalability of distributed training, which inspires us to design a locality-aware data loading method.

Cannot find the paper you are looking for? You can Submit a new open access paper.