Search Results for author: Fangyu Wang

Found 2 papers, 0 papers with code

FlattenQuant: Breaking Through the Inference Compute-bound for Large Language Models with Per-tensor Quantization

no code implementations • 28 Feb 2024 • Yi Zhang, Fei Yang, Shuang Peng, Fangyu Wang, Aimin Pan

The 4-bit matrix multiplication introduced in the FlattenQuant method can effectively address the compute-bound caused by large matrix calculation.

Paper
Add Code

Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC Environment

no code implementations • 6 Dec 2023 • Fei Yang, Shuang Peng, Ning Sun, Fangyu Wang, Yuanyuan Wang, Fu Wu, Jiezhong Qiu, Aimin Pan

Large language models (LLMs) such as GPT-3, OPT, and LLaMA have demonstrated remarkable accuracy in a wide range of tasks.

Llama Scheduling

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.