no code implementations • 15 Jan 2024 • Adnan Hoque, Mudhakar Srivatsa, Chih-Chieh Yang, Raghu Ganti
In this paper, we present a novel method that reduces model inference latency during distributed deployment of Large Language Models (LLMs).
no code implementations • 5 Jan 2024 • Adnan Hoque, Less Wright, Chih-Chieh Yang, Mudhakar Srivatsa, Raghu Ganti
Our implementation shows improvement for the type of skinny matrix-matrix multiplications found in foundation model inference workloads.
no code implementations • 25 May 2022 • Yicong Zhu, Changnian Han, Peng Zhang, Guojing Cong, James R. Kozloski, Chih-Chieh Yang, Leili Zhang, Yuefan Deng
We have developed an AI-aided multiple time stepping (AI-MTS) algorithm and multiscale modeling framework (AI-MSM) and implemented them on the Summit-like supercomputer, AIMOS.
no code implementations • 27 Nov 2020 • Leili Zhang, Giacomo Domeniconi, Chih-Chieh Yang, Seung-gu Kang, Ruhong Zhou, Guojing Cong
Drug discovery is a multi-stage process that comprises two costly major steps: pre-clinical research and clinical trials.
no code implementations • 2 Oct 2019 • Chih-Chieh Yang, Guojing Cong
Our model suggests that I/O rate limits the scalability of distributed training, which inspires us to design a locality-aware data loading method.