no code implementations • 7 Feb 2025 • Shoufa Chen, Chongjian Ge, Yuqi Zhang, Yida Zhang, Fengda Zhu, Hao Yang, Hongxiang Hao, Hui Wu, Zhichao Lai, Yifei Hu, Ting-Che Lin, Shilong Zhang, Fu Li, Chuan Li, Xing Wang, Yanghua Peng, Peize Sun, Ping Luo, Yi Jiang, Zehuan Yuan, Bingyue Peng, Xiaobing Liu
This paper introduces Goku, a state-of-the-art family of joint image-and-video generation models leveraging rectified flow Transformers to achieve industry-leading performance.
2 code implementations • 28 Sep 2024 • Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, Chuan Wu
Traditional RL can be modeled as a dataflow, where each node represents computation of a neural network (NN) and each edge denotes data dependencies between the NNs.
no code implementations • 7 Aug 2024 • Weiqi Feng, Yangrui Chen, Shaoyu Wang, Yanghua Peng, Haibin Lin, Minlan Yu
Multimodal large language models (MLLMs) have extended the success of large language models (LLMs) to multiple data types, such as image, text and audio, achieving significant performance in various domains, including multimodal translation, visual question answering and content generation.
no code implementations • 29 Jul 2024 • Borui Wan, Mingji Han, Yiyao Sheng, Yanghua Peng, Haibin Lin, Mofan Zhang, Zhichao Lai, Menghan Yu, Junda Zhang, Zuquan Song, Xin Liu, Chuan Wu
In production, different LFMs are trained with various frameworks and storage backends, depending on model sizes and training scales.
1 code implementation • 2 Jul 2024 • Juntao Zhao, Borui Wan, Yanghua Peng, Haibin Lin, Yibo Zhu, Chuan Wu
A number of production deep learning clusters have attempted to explore inference hardware for DNN training, at the off-peak serving hours with many inference GPUs idling.
1 code implementation • 2 Mar 2024 • Juntao Zhao, Borui Wan, Yanghua Peng, Haibin Lin, Chuan Wu
The immense sizes of LLMs have led to very high resource demand and cost for running the models.
1 code implementation • 23 Feb 2024 • Ziheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, Yulu Jia, Sun He, Hongmin Chen, Zhihao Bai, Qi Hou, Shipeng Yan, Ding Zhou, Yiyao Sheng, Zhuo Jiang, Haohan Xu, Haoran Wei, Zhang Zhang, Pengfei Nie, Leqi Zou, Sida Zhao, Liang Xiang, Zherui Liu, Zhe Li, Xiaoying Jia, Jianxi Ye, Xin Jin, Xin Liu
Training LLMs at this scale brings unprecedented challenges to training efficiency and stability.
1 code implementation • 16 Nov 2023 • Hanpeng Hu, Junwei Su, Juntao Zhao, Yanghua Peng, Yibo Zhu, Haibin Lin, Chuan Wu
Considering the large space of DNN models and devices that impede direct profiling of all combinations, recent efforts focus on building a predictor to model the performance of DNN models on different devices.
no code implementations • 5 May 2022 • Hanpeng Hu, Chenyu Jiang, Yuchen Zhong, Yanghua Peng, Chuan Wu, Yibo Zhu, Haibin Lin, Chuanxiong Guo
Distributed training using multiple devices (e. g., GPUs) has been widely adopted for learning DNN models over large datasets.
no code implementations • 16 Dec 2021 • Tianfeng Liu, Yangrui Chen, Dan Li, Chuan Wu, Yibo Zhu, Jun He, Yanghua Peng, Hongzheng Chen, Hongzhi Chen, Chuanxiong Guo
Extensive experiments on various GNN models and large graph datasets show that BGL significantly outperforms existing GNN training systems by 20. 68x on average.
1 code implementation • 13 Sep 2019 • Yanghua Peng, Yixin Bao, Yangrui Chen, Chuan Wu, Chen Meng, Wei. Lin
DL2 is a DL-driven scheduler for DL clusters, targeting global training job expedition by dynamically resizing resources allocated to jobs.
1 code implementation • EuroSys 2018 • Yanghua Peng
A deep learning training job is resource-intensive and time-consuming.
no code implementations • 3 Jan 2018 • Yixin Bao, Yanghua Peng, Chuan Wu, Zongpeng Li
In a shared cluster handling multiple training jobs, a fundamental issue is how to efficiently schedule jobs and set the number of concurrent workers to run for each job, such that server resources are maximally utilized and model training can be completed in time.
Distributed, Parallel, and Cluster Computing