Search Results for author: Yibo Zhu

Found 14 papers, 4 papers with code

DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training

no code implementations11 Feb 2025 Xin Tan, Yuetao Chen, Yimin Jiang, Xing Chen, Kun Yan, Nan Duan, Yibo Zhu, Daxin Jiang, Hong Xu

Diffusion Transformers (DiTs) have shown remarkable performance in modeling and generating high-quality videos.

InfinitePOD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers

no code implementations6 Feb 2025 Chenchen Shou, Guyue Liu, Hao Nie, Huaiyu Meng, Yu Zhou, Yimin Jiang, Wenqing Lv, Yelong Xu, Yuanwei Lu, Zhang Chen, Yanbo Yu, Yichen Shen, Yibo Zhu, Daxin Jiang

We propose InfinitePOD, a novel transceiver-centric HBD architecture that unifies connectivity and dynamic switching at the transceiver level using Optical Circuit Switching (OCS).

Large Language Model

RLHFuse: Efficient RLHF Training for Large Language Models with Inter- and Intra-Stage Fusion

no code implementations20 Sep 2024 Yinmin Zhong, Zili Zhang, Bingyang Wu, Shengyu Liu, Yukun Chen, Changyi Wan, Hanpeng Hu, Lei Xia, Ranchen Ming, Yibo Zhu, Xin Jin

RLHFuse breaks the traditional view of RLHF workflow as a composition of individual tasks, splitting each task into finer-grained subtasks, and performing stage fusion to improve GPU utilization.

QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices

1 code implementation2 Jul 2024 Juntao Zhao, Borui Wan, Yanghua Peng, Haibin Lin, Yibo Zhu, Chuan Wu

A number of production deep learning clusters have attempted to explore inference hardware for DNN training, at the off-peak serving hours with many inference GPUs idling.

Quantization

CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs

1 code implementation16 Nov 2023 Hanpeng Hu, Junwei Su, Juntao Zhao, Yanghua Peng, Yibo Zhu, Haibin Lin, Chuan Wu

Considering the large space of DNN models and devices that impede direct profiling of all combinations, recent efforts focus on building a predictor to model the performance of DNN models on different devices.

Domain Adaptation Prediction

ByteComp: Revisiting Gradient Compression in Distributed Training

no code implementations28 May 2022 Zhuang Wang, Haibin Lin, Yibo Zhu, T. S. Eugene Ng

It first designs a decision tree abstraction to express all the compression strategies and develops empirical models to timeline tensor computation, communication, and compression to enable ByteComp to derive the intricate interactions among tensors.

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

no code implementations5 May 2022 Hanpeng Hu, Chenyu Jiang, Yuchen Zhong, Yanghua Peng, Chuan Wu, Yibo Zhu, Haibin Lin, Chuanxiong Guo

Distributed training using multiple devices (e. g., GPUs) has been widely adopted for learning DNN models over large datasets.

BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing

no code implementations16 Dec 2021 Tianfeng Liu, Yangrui Chen, Dan Li, Chuan Wu, Yibo Zhu, Jun He, Yanghua Peng, Hongzheng Chen, Hongzhi Chen, Chuanxiong Guo

Extensive experiments on various GNN models and large graph datasets show that BGL significantly outperforms existing GNN training systems by 20. 68x on average.

Graph Property Prediction Node Classification +1

Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance

no code implementations25 Oct 2021 Jiarong Xing, Leyuan Wang, Shang Zhang, Jack Chen, Ang Chen, Yibo Zhu

Today's auto-tuners (e. g., AutoTVM, Ansor) generate efficient tensor programs by navigating a large search space to identify effective implementations, but they do so with opaque hardware details.

High performance integrated graphene electro-optic modulator at cryogenic temperature

no code implementations2 Jul 2020 Brian S. Lee, Bumho Kim, Alexandre P. Freitas, Aseema Mohanty, Yibo Zhu, Gaurang R. Bhatt, James Hone, Michal Lipson

High performance integrated electro-optic modulators operating at low temperature are critical for optical interconnects in cryogenic applications.

Applied Physics Optics

Cannot find the paper you are looking for? You can Submit a new open access paper.