no code implementations • 11 Feb 2025 • Xin Tan, Yuetao Chen, Yimin Jiang, Xing Chen, Kun Yan, Nan Duan, Yibo Zhu, Daxin Jiang, Hong Xu
Diffusion Transformers (DiTs) have shown remarkable performance in modeling and generating high-quality videos.
no code implementations • 6 Feb 2025 • Chenchen Shou, Guyue Liu, Hao Nie, Huaiyu Meng, Yu Zhou, Yimin Jiang, Wenqing Lv, Yelong Xu, Yuanwei Lu, Zhang Chen, Yanbo Yu, Yichen Shen, Yibo Zhu, Daxin Jiang
We propose InfinitePOD, a novel transceiver-centric HBD architecture that unifies connectivity and dynamic switching at the transceiver level using Optical Circuit Switching (OCS).
no code implementations • 20 Sep 2024 • Yinmin Zhong, Zili Zhang, Bingyang Wu, Shengyu Liu, Yukun Chen, Changyi Wan, Hanpeng Hu, Lei Xia, Ranchen Ming, Yibo Zhu, Xin Jin
RLHFuse breaks the traditional view of RLHF workflow as a composition of individual tasks, splitting each task into finer-grained subtasks, and performing stage fusion to improve GPU utilization.
1 code implementation • 2 Jul 2024 • Juntao Zhao, Borui Wan, Yanghua Peng, Haibin Lin, Yibo Zhu, Chuan Wu
A number of production deep learning clusters have attempted to explore inference hardware for DNN training, at the off-peak serving hours with many inference GPUs idling.
1 code implementation • 16 Nov 2023 • Hanpeng Hu, Junwei Su, Juntao Zhao, Yanghua Peng, Yibo Zhu, Haibin Lin, Chuan Wu
Considering the large space of DNN models and devices that impede direct profiling of all combinations, recent efforts focus on building a predictor to model the performance of DNN models on different devices.
1 code implementation • 6 Oct 2022 • Yujia Zhai, Chengquan Jiang, Leyuan Wang, Xiaoying Jia, Shang Zhang, Zizhong Chen, Xin Liu, Yibo Zhu
In this paper, we present ByteTransformer, a high-performance transformer boosted for variable-length inputs.
no code implementations • 28 May 2022 • Zhuang Wang, Haibin Lin, Yibo Zhu, T. S. Eugene Ng
It first designs a decision tree abstraction to express all the compression strategies and develops empirical models to timeline tensor computation, communication, and compression to enable ByteComp to derive the intricate interactions among tensors.
no code implementations • 5 May 2022 • Hanpeng Hu, Chenyu Jiang, Yuchen Zhong, Yanghua Peng, Chuan Wu, Yibo Zhu, Haibin Lin, Chuanxiong Guo
Distributed training using multiple devices (e. g., GPUs) has been widely adopted for learning DNN models over large datasets.
no code implementations • 16 Feb 2022 • Jiamin Li, Hong Xu, Yibo Zhu, Zherui Liu, Chuanxiong Guo, Cong Wang
We introduce Aryl, a new cluster scheduler to address these problems.
no code implementations • 16 Dec 2021 • Tianfeng Liu, Yangrui Chen, Dan Li, Chuan Wu, Yibo Zhu, Jun He, Yanghua Peng, Hongzheng Chen, Hongzhi Chen, Chuanxiong Guo
Extensive experiments on various GNN models and large graph datasets show that BGL significantly outperforms existing GNN training systems by 20. 68x on average.
no code implementations • 25 Oct 2021 • Jiarong Xing, Leyuan Wang, Shang Zhang, Jack Chen, Ang Chen, Yibo Zhu
Today's auto-tuners (e. g., AutoTVM, Ansor) generate efficient tensor programs by navigating a large search space to identify effective implementations, but they do so with opaque hardware details.
no code implementations • 18 Sep 2021 • Cheng Tan, Zhichao Li, Jian Zhang, Yu Cao, Sikai Qi, Zherui Liu, Yibo Zhu, Chuanxiong Guo
With MIG, A100 can be the most cost-efficient GPU ever for serving Deep Neural Networks (DNNs).
1 code implementation • ICLR 2021 • Yuchen Jin, Tianyi Zhou, Liangyu Zhao, Yibo Zhu, Chuanxiong Guo, Marco Canini, Arvind Krishnamurthy
This mutual-training process between BO and the loss-prediction model allows us to limit the training steps invested in the BO search.
no code implementations • 2 Jul 2020 • Brian S. Lee, Bumho Kim, Alexandre P. Freitas, Aseema Mohanty, Yibo Zhu, Gaurang R. Bhatt, James Hone, Michal Lipson
High performance integrated electro-optic modulators operating at low temperature are critical for optical interconnects in cryogenic applications.
Applied Physics Optics