no code implementations • ICML 2020 • Fangcheng Fu, Yuzheng Hu, Yihan He, Jiawei Jiang, Yingxia Shao, Ce Zhang, Bin Cui
Recent years have witnessed intensive research interests on training deep neural networks (DNNs) more efficiently by quantization-based compression methods, which facilitate DNNs training in two ways: (1) activations are quantized to shrink the memory consumption, and (2) gradients are quantized to decrease the communication cost.
no code implementations • 28 Feb 2025 • Hao Ge, Junda Feng, Qi Huang, Fangcheng Fu, Xiaonan Nie, Lei Zuo, Haibin Lin, Bin Cui, Xin Liu
The mismatch between data heterogeneity and static mesh causes redundant communication and imbalanced computation, degrading the training efficiency.
no code implementations • 10 Dec 2024 • Haoyang Li, Fangcheng Fu, Sheng Lin, Hao Ge, XuanYu Wang, Jiawen Niu, Jie Jiang, Bin Cui
To optimize large Transformer model training, efficient parallel computing and advanced data management are essential.
1 code implementation • 2 Dec 2024 • Yujie Wang, Shiju Wang, Shenhan Zhu, Fangcheng Fu, Xinyi Liu, Xuefeng Xiao, Huixia Li, Jiashi Li, Faming Wu, Bin Cui
Furthermore, we implement our method in a high-performance system that supports adaptive parallelization in distributed LLM training.
no code implementations • 8 Oct 2024 • Bozhou Li, Hao Liang, Yang Li, Fangcheng Fu, Hongzhi Yin, Conghui He, Wentao Zhang
During the pretraining phase, large language models (LLMs) acquire vast amounts of knowledge from extensive text corpora.
no code implementations • 19 Sep 2024 • Peichao Lai, Zhengfeng Zhang, Wentao Zhang, Fangcheng Fu, Bin Cui
Recently, using large language models (LLMs) for data augmentation has led to considerable improvements in unsupervised sentence embedding models.
1 code implementation • 9 Sep 2024 • Qiang Huang, Xiao Yan, Xin Wang, Susie Xi Rao, Zhichao Han, Fangcheng Fu, Wentao Zhang, Jiawei Jiang
We also adapt Transformer codebase to train TF-TGN efficiently with multiple GPUs.
no code implementations • 5 Sep 2024 • Yujie Wang, Shenhan Zhu, Fangcheng Fu, Xupeng Miao, Jie Zhang, Juan Zhu, Fan Hong, Yong Li, Bin Cui
Recent foundation models are capable of handling multiple tasks and multiple data modalities with the unified base model structure and several specialized model components.
no code implementations • 16 Jul 2024 • Pinxue Zhao, Hailin Zhang, Fangcheng Fu, Xiaonan Nie, Qibin Liu, Fang Yang, Yuanbo Peng, Dian Jiao, Shuaipeng Li, Jinbao Xue, Yangyu Tao, Bin Cui
By leveraging fine-grained activation memory management, MEMO facilitates efficient training of 7B LLM with 1 million sequence length on just 8 A800 GPUs, achieving an MFU of 52. 30%.
no code implementations • 1 Jul 2024 • Hailin Zhang, Xiaodong Ji, Yilin Chen, Fangcheng Fu, Xupeng Miao, Xiaonan Nie, WeiPeng Chen, Bin Cui
During the prefilling phase, we apply PQ to tokens' keys for each LLM layer and head.
3 code implementations • 29 Feb 2024 • Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, Jie Jiang, Bin Cui
We first classify RAG foundations according to how the retriever augments the generator, distilling the fundamental abstractions of the augmentation methodologies for various retrievers and generators.
no code implementations • 24 Oct 2023 • Yuxiang Wang, Xiao Yan, Chuang Hu, Fangcheng Fu, Wentao Zhang, Hao Wang, Shuo Shang, Jiawei Jiang
For graph self-supervised learning (GSSL), masked autoencoder (MAE) follows the generative paradigm and learns to reconstruct masked graph edges or node features.
1 code implementation • 5 Jul 2023 • Yujie Wang, Youhe Jiang, Xupeng Miao, Fangcheng Fu, Shenhan Zhu, Xiaonan Nie, Yaofeng Tu, Bin Cui
Transformer models have emerged as the leading approach for achieving state-of-the-art performance across various application domains, serving as the foundation for advanced large-scale deep learning (DL) models.
2 code implementations • 27 May 2023 • Zihao Yu, Haoyang Li, Fangcheng Fu, Xupeng Miao, Bin Cui
The key intuition behind our approach is to utilize the semantic mapping between the minor modifications on the input text and the affected regions on the output image.
no code implementations • 6 Mar 2023 • Xiaonan Nie, Yi Liu, Fangcheng Fu, Jinbao Xue, Dian Jiao, Xupeng Miao, Yangyu Tao, Bin Cui
Recent years have witnessed the unprecedented achievements of large-scale pre-trained models, especially the Transformer models.
no code implementations • 29 Jul 2022 • Fangcheng Fu, Xupeng Miao, Jiawei Jiang, Huanran Xue, Bin Cui
Vertical federated learning (VFL) is an emerging paradigm that allows different parties (e. g., organizations or enterprises) to collaboratively build machine learning models with privacy protection.
no code implementations • 16 Jun 2022 • Fangcheng Fu, Huanran Xue, Yong Cheng, Yangyu Tao, Bin Cui
First, to address the functionality of VFL models, we propose the federated source layers to unite the data from different parties.
no code implementations • 26 Dec 2021 • Shicheng Gao, Jie Xu, Xiaosen Li, Fangcheng Fu, Wentao Zhang, Wen Ouyang, Yangyu Tao, Bin Cui
For example, the distributed K-core decomposition algorithm can scale to a large graph with 136 billion edges without losing correctness with our divide-and-conquer technique.
no code implementations • 3 Jul 2019 • Fangcheng Fu, Jiawei Jiang, Yingxia Shao, Bin Cui
Gradient boosting decision tree (GBDT) is a widely-used machine learning algorithm in both data analytic competitions and real-world industrial applications.