Search Results for author: Binzhang Fu

Found 1 papers, 0 papers with code

Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach

no code implementations7 Jun 2024 Jianbo Dong, Bin Luo, Jun Zhang, Pengcheng Zhang, Fei Feng, Yikai Zhu, Ang Liu, Zian Chen, Yi Shi, Hairong Jiao, Gang Lu, Yu Guan, Ennan Zhai, Wencong Xiao, Hanyu Zhao, Man Yuan, Siran Yang, Xiang Li, Jiamang Wang, Rui Men, Jianwei Zhang, Huang Zhong, Dennis Cai, Yuan Xie, Binzhang Fu

By leveraging this feature, C4 can rapidly identify the faulty components, swiftly isolate the anomaly, and restart the task, thereby avoiding resource wastage caused by delays in anomaly detection.

Anomaly Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.