1 code implementation • 24 Dec 2024 • Changfu Xu, Jianxiong Guo, WanYu Lin, Haodong Zou, Wentao Fan, Tian Wang, Xiaowen Chu, Jiannong Cao
The LAD-TS generates a near-optimal offloading decision by leveraging the diffusion model's conditional generation capability and the reinforcement learning's environment interaction ability, thereby minimizing the service delays under multiple resource constraints.
1 code implementation • 27 Oct 2024 • Zhenheng Tang, Yonggang Zhang, Peijie Dong, Yiu-ming Cheung, Amelie Chi Zhou, Bo Han, Xiaowen Chu
In this work, we provide a causal view to find that this performance drop of OFL methods comes from the isolation problem, which means that local isolatedly trained models in OFL may easily fit to spurious correlations due to the data heterogeneity.
1 code implementation • 24 Oct 2024 • Qi Li, Xiang Liu, Zhenheng Tang, Peijie Dong, Zeyu Li, Xinglin Pan, Xiaowen Chu
Our findings indicate that current editing methods are only suitable for small-scale knowledge updates within language models, which motivates further research on more practical and reliable editing methods.
no code implementations • 23 Oct 2024 • Xin He, Shunkang Zhang, Yuxin Wang, Haiyan Yin, Zihao Zeng, Shaohuai Shi, Zhenheng Tang, Xiaowen Chu, Ivor Tsang, Ong Yew Soon
To tackle these inference-specific challenges, we introduce ExpertFlow, a comprehensive system specifically designed to enhance inference efficiency by accommodating flexible routing and enabling efficient expert scheduling between CPU and GPU.
no code implementations • 16 Oct 2024 • Zhenheng Tang, Xueze Kang, Yiming Yin, Xinglin Pan, Yuxin Wang, Xin He, Qiang Wang, Rongfei Zeng, Kaiyong Zhao, Shaohuai Shi, Amelie Chi Zhou, Bo Li, Bingsheng He, Xiaowen Chu
To alleviate hardware scarcity in training large deep neural networks (DNNs), particularly large language models (LLMs), we present FusionLLM, a decentralized training system designed and implemented for training DNNs using geo-distributed GPUs across different computing clusters or individual devices.
no code implementations • 7 Oct 2024 • Peijie Dong, Lujun Li, Xiang Liu, Zhenheng Tang, Xuebo Liu, Qiang Wang, Xiaowen Chu
Specifically, we model the ZC proxy as a symbolic equation and incorporate a unified proxy search space that encompasses existing ZC proxies, which are composed of a predefined set of mathematical symbols.
1 code implementation • 5 Oct 2024 • Xiang Liu, Peijie Dong, Xuming Hu, Xiaowen Chu
Current long-context benchmarks primarily focus on retrieval-based tests, requiring Large Language Models (LLMs) to locate specific information within extensive input contexts, such as the needle-in-a-haystack (NIAH) benchmark.
no code implementations • 27 Aug 2024 • Zichen Tang, Junlin Huang, Rudan Yan, Yuxin Wang, Zhenheng Tang, Shaohuai Shi, Amelie Chi Zhou, Xiaowen Chu
Current data compression methods, such as sparsification in Federated Averaging (FedAvg), effectively enhance the communication efficiency of Federated Learning (FL).
no code implementations • 15 Aug 2024 • Shengyuan Ye, Liekang Zeng, Xiaowen Chu, Guoliang Xing, Xu Chen
We implement Asteroid on heterogeneous edge devices with both vision and language models, demonstrating up to 12. 2x faster training than conventional parallelism methods and 2. 1x faster than state-of-the-art hybrid parallelism methods through evaluations.
no code implementations • 3 Aug 2024 • Peijie Dong, Lujun Li, Yuedong Zhong, Dayou Du, Ruibo Fan, Yuhan Chen, Zhenheng Tang, Qiang Wang, Wei Xue, Yike Guo, Xiaowen Chu
In this paper, we present the first structural binarization method for LLM compression to less than 1-bit precision.
no code implementations • 24 Jul 2024 • Penglei Sun, Yaoxian Song, Xiang Liu, Xiaofei Yang, Qiang Wang, Tiefeng Li, Yang Yang, Xiaowen Chu
3D multimodal question answering (MQA) plays a crucial role in scene understanding by enabling intelligent agents to comprehend their surroundings in 3D environments.
no code implementations • 3 Jul 2024 • Penglei Sun, Yaoxian Song, Xinglin Pan, Peijie Dong, Xiaofei Yang, Qiang Wang, Zhixu Li, Tiefeng Li, Xiaowen Chu
However, they have failed to consider exploring the cross-modal representation of language-vision alignment in the cross-domain field.
1 code implementation • 30 Jun 2024 • Xinglin Pan, WenXiang Lin, Shaohuai Shi, Xiaowen Chu, Weinong Sun, Bo Li
Sparsely-activated Mixture-of-Expert (MoE) layers have found practical applications in enlarging the model size of large-scale foundation models, with only a sub-linear increase in computation demands.
no code implementations • 23 Jun 2024 • Xianda Chen, Xu Han, Meixin Zhu, Xiaowen Chu, PakHin Tiu, Xinhu Zheng, Yinhai Wang
To overcome these limitations, we propose the Editable Behavior Generation (EBG) model, a data-driven car-following model that allows for adjusting driving discourtesy levels.
no code implementations • 15 Jun 2024 • Xu Han, Qiannan Yang, Xianda Chen, Xiaowen Chu, Meixin Zhu
Reinforcement Learning (RL) plays a crucial role in advancing autonomous driving technologies by maximizing reward functions to achieve the optimal policy.
1 code implementation • 5 Jun 2024 • Peijie Dong, Lujun Li, Zhenheng Tang, Xiang Liu, Xinglin Pan, Qiang Wang, Xiaowen Chu
In particular, we devise an elaborate search space encompassing the existing pruning metrics to discover the potential symbolic pruning metric.
no code implementations • 27 May 2024 • Shengyuan Ye, Jiangsu Du, Liekang Zeng, Wenzhong Ou, Xiaowen Chu, Yutong Lu, Xu Chen
Transformer-based models have unlocked a plethora of powerful intelligent applications at the edge, such as voice assistant in smart home.
1 code implementation • 1 May 2024 • Dayou Du, Gu Gong, Xiaowen Chu
Model quantization, by converting high-precision numbers to lower-precision, reduces the computational demands and memory needs of ViTs, allowing the creation of hardware specifically optimized for these quantized algorithms, boosting efficiency.
1 code implementation • 25 Mar 2024 • Yujin Tang, Peijie Dong, Zhenheng Tang, Xiaowen Chu, Junwei Liang
Combining CNNs or ViTs, with RNNs for spatiotemporal forecasting, has yielded unparalleled results in predicting temporal and spatial dynamics.
2 code implementations • 16 Feb 2024 • Dayou Du, Yijia Zhang, Shijie Cao, Jiaqi Guo, Ting Cao, Xiaowen Chu, Ningyi Xu
The upscaling of Large Language Models (LLMs) has yielded impressive advances in natural language processing, yet it also poses significant deployment challenges.
no code implementations • 10 Feb 2024 • Zhenheng Tang, Yonggang Zhang, Shaohuai Shi, Xinmei Tian, Tongliang Liu, Bo Han, Xiaowen Chu
First, we analyze the generalization contribution of local training and conclude that this generalization contribution is bounded by the conditional Wasserstein distance between the data distribution of different clients.
no code implementations • 3 Feb 2024 • Peijie Dong, Lujun Li, Xinglin Pan, Zimian Wei, Xiang Liu, Qiang Wang, Xiaowen Chu
Recent advancements in Zero-shot Neural Architecture Search (NAS) highlight the efficacy of zero-cost proxies in various NAS benchmarks.
no code implementations • 14 Dec 2023 • Qingsong Yan, Qiang Wang, Kaiyong Zhao, Jie Chen, Bo Li, Xiaowen Chu, Fei Deng
Neural Radiance Fields (NeRF) have demonstrated impressive performance in novel view synthesis.
no code implementations • 7 Nov 2023 • Longteng Zhang, Xiang Liu, Zeyu Li, Xinglin Pan, Peijie Dong, Ruibo Fan, Rui Guo, Xin Wang, Qiong Luo, Shaohuai Shi, Xiaowen Chu
For end users, our benchmark and findings help better understand different optimization techniques, training and inference frameworks, together with hardware platforms in choosing configurations for deploying LLMs.
no code implementations • 3 Sep 2023 • Zhenheng Tang, Yuxin Wang, Xin He, Longteng Zhang, Xinglin Pan, Qiang Wang, Rongfei Zeng, Kaiyong Zhao, Shaohuai Shi, Bingsheng He, Xiaowen Chu
The rapid growth of memory and computation requirements of large language models (LLMs) has outpaced the development of hardware, hindering people who lack large-scale high-end GPUs from training or deploying LLMs.
no code implementations • 30 Aug 2023 • Xu Han, Xianda Chen, Meixin Zhu, Pinlong Cai, Jianshan Zhou, Xiaowen Chu
The experimental results illustrate that EnsembleFollower yields improved accuracy of human-like behavior and achieves effectiveness in combining hybrid models, demonstrating that our proposed framework can handle diverse car-following conditions by leveraging the strengths of various low-level models.
no code implementations • 7 Aug 2023 • Longteng Zhang, Lin Zhang, Shaohuai Shi, Xiaowen Chu, Bo Li
The low-rank adaptation (LoRA) method can largely reduce the amount of trainable parameters for fine-tuning large language models (LLMs), however, it still requires expensive activation memory to update low-rank weights.
1 code implementation • 15 Jun 2023 • Lin Zhang, Longteng Zhang, Shaohuai Shi, Xiaowen Chu, Bo Li
To accelerate distributed training, many gradient compression methods have been proposed to alleviate the communication bottleneck in synchronous stochastic gradient descent (S-SGD), but their efficacy in real-world applications still remains unclear.
1 code implementation • 3 Mar 2023 • Zhenheng Tang, Xiaowen Chu, Ryan Yide Ran, Sunwoo Lee, Shaohuai Shi, Yonggang Zhang, Yuxin Wang, Alex Qiaozhong Liang, Salman Avestimehr, Chaoyang He
It improves the training efficiency, remarkably relaxes the requirements on the hardware, and supports efficient large-scale FL experiments with stateful clients by: (1) sequential training clients on devices; (2) decomposing original aggregation into local and global aggregation on devices and server respectively; (3) scheduling tasks to mitigate straggler problems and enhance computing utility; (4) distributed client state manager to support various FL algorithms.
1 code implementation • 24 Feb 2023 • Lin Zhang, Shaohuai Shi, Xiaowen Chu, Wei Wang, Bo Li, Chengjian Liu
Communication scheduling has been shown to be effective in accelerating distributed training, which enables all-reduce communications to be overlapped with backpropagation computations.
no code implementations • 27 Jan 2023 • Yaoxian Song, Penglei Sun, Piaopiao Jin, Yi Ren, Yu Zheng, Zhixu Li, Xiaowen Chu, Yue Zhang, Tiefeng Li, Jason Gu
From the perspective of robotic cognition, we design a two-stage fine-grained robotic grasping framework (named LangPartGPD), including a novel 3D part language grounding model and a part-aware grasp pose detection model, in which explicit language input from human or large language models (LLMs) could guide a robot to generate part-level 6-DoF grasping pose with textual explanation.
1 code implementation • 30 Nov 2022 • Qingsong Yan, Qiang Wang, Kaiyong Zhao, Bo Li, Xiaowen Chu, Fei Deng
Existing learning-based multi-view stereo (MVS) methods rely on the depth range to build the 3D cost volume and may fail when the range is too large or unreliable.
1 code implementation • 23 Nov 2022 • Xin He, Jiangchao Yao, Yuxin Wang, Zhenheng Tang, Ka Chu Cheung, Simon See, Bo Han, Xiaowen Chu
One-shot neural architecture search (NAS) substantially improves the search efficiency by training one supernet to estimate the performance of every possible child architecture (i. e., subnet).
Ranked #26 on Neural Architecture Search on NAS-Bench-201, CIFAR-10
no code implementations • 29 Aug 2022 • Qingsong Yan, Qiang Wang, Kaiyong Zhao, Bo Li, Xiaowen Chu, Fei Deng
The panorama image can simultaneously demonstrate complete information of the surrounding environment and has many advantages in virtual tourism, games, robotics, etc.
Ranked #17 on Depth Estimation on Stanford2D3D Panoramic
1 code implementation • 20 Jul 2022 • Qiang Wang, Shaohuai Shi, Kaiyong Zhao, Xiaowen Chu
However, existing NAS studies on the dense prediction task, especially stereo matching, still cannot be efficiently and effectively deployed on devices of different computing capabilities.
1 code implementation • 6 Jun 2022 • Zhenheng Tang, Yonggang Zhang, Shaohuai Shi, Xin He, Bo Han, Xiaowen Chu
In federated learning (FL), model performance typically suffers from client drift induced by data heterogeneity, and mainstream works focus on correcting client drift.
2 code implementations • 30 May 2022 • Yongqi Zhang, Zhanke Zhou, Quanming Yao, Xiaowen Chu, Bo Han
An important design component of GNN-based KG reasoning methods is called the propagation path, which contains a set of involved entities in each propagation step.
1 code implementation • 30 Nov 2021 • Guohao Ying, Xin He, Bin Gao, Bo Han, Xiaowen Chu
Some recent works try to search both generator (G) and discriminator (D), but they suffer from the instability of GAN training.
Ranked #16 on Image Generation on STL-10
1 code implementation • 22 Nov 2021 • Chaoyang He, Alay Dilipbhai Shah, Zhenheng Tang, Di Fan1Adarshan Naiynar Sivashunmugam, Keerti Bhogaraju, Mita Shimpi, Li Shen, Xiaowen Chu, Mahdi Soltanolkotabi, Salman Avestimehr
To bridge the gap and facilitate the development of FL for computer vision tasks, in this work, we propose a federated learning library and benchmarking framework, named FedCV, to evaluate FL on the three most representative computer vision tasks: image classification, image segmentation, and object detection.
no code implementations • 12 Oct 2021 • Vihan Lakshman, Choon Hui Teo, Xiaowen Chu, Priyanka Nigam, Abhinandan Patni, Pooja Maknikar, SVN Vishwanathan
When training a dyadic model, one seeks to embed two different types of entities (e. g., queries and documents or users and movies) in a common vector space such that pairs with high relevance are positioned nearby.
no code implementations • 6 Oct 2021 • Qiang Wang, Shaohuai Shi, Shizhen Zheng, Kaiyong Zhao, Xiaowen Chu
The disparity estimation problem tends to be addressed by DNNs which achieve much better prediction accuracy than traditional hand-crafted feature-based methods.
no code implementations • 27 Jun 2021 • Rongfei Zeng, Chao Zeng, Xingwei Wang, Bo Li, Xiaowen Chu
Federated learning utilizes various resources provided by participants to collaboratively train a global model, which potentially address the data privacy issue of machine learning.
1 code implementation • 26 Jan 2021 • Xin He, Guohao Ying, Jiyong Zhang, Xiaowen Chu
We propose a new objective, namely potential, which can help exploit promising models to indirectly reduce the number of models involved in weights training, thus alleviating search instability.
2 code implementations • 14 Jan 2021 • Xin He, Shihao Wang, Xiaowen Chu, Shaohuai Shi, Jiangping Tang, Xin Liu, Chenggang Yan, Jiyong Zhang, Guiguang Ding
The experimental results show that our automatically searched models (CovidNet3D) outperform the baseline human-designed models on the three datasets with tens of times smaller model size and higher accuracy.
no code implementations • CVPR 2021 • Songyan Zhang, Zhicheng Wang, Qiang Wang, Jinshuo Zhang, Gang Wei, Xiaowen Chu
Existing state-of-the-art disparity estimation works mostly leverage the 4D concatenation volume and construct a very deep 3D convolution neural network (CNN) for disparity regression, which is inefficient due to the high memory consumption and slow inference speed.
no code implementations • 20 Oct 2020 • Shaohuai Shi, Xianhao Zhou, Shutao Song, Xingyao Wang, Zilin Zhu, Xue Huang, Xinan Jiang, Feihu Zhou, Zhenyu Guo, Liqiang Xie, Rui Lan, Xianbin Ouyang, Yan Zhang, Jieqian Wei, Jing Gong, Weiliang Lin, Ping Gao, Peng Meng, Xiaomin Xu, Chenyang Guo, Bo Yang, Zhibo Chen, Yongjian Wu, Xiaowen Chu
Distributed training techniques have been widely deployed in large-scale deep neural networks (DNNs) training on dense-GPU clusters.
1 code implementation • 27 May 2020 • Shaohuai Shi, Zhenheng Tang, Xiaowen Chu, Chengjian Liu, Wei Wang, Bo Li
In this article, we present a quantitative survey of communication optimization techniques for data parallel distributed DL.
2 code implementations • 24 Mar 2020 • Qiang Wang, Shaohuai Shi, Shizhen Zheng, Kaiyong Zhao, Xiaowen Chu
Deep neural networks (DNNs) have achieved great success in the area of computer vision.
no code implementations • 10 Mar 2020 • Zhenheng Tang, Shaohuai Shi, Wei Wang, Bo Li, Xiaowen Chu
In this paper, we provide a comprehensive survey of the communication-efficient distributed training algorithms, focusing on both system-level and algorithmic-level optimizations.
no code implementations • 24 Feb 2020 • Qiang Wang, Shaohuai Shi, Canhui Wang, Xiaowen Chu
We thus propose a provable algorithm, AdaDUAL, to efficiently schedule those communication tasks.
no code implementations • 22 Feb 2020 • Rongfei Zeng, Shixun Zhang, Jiaqi Wang, Xiaowen Chu
In MEC, edge nodes would not like to voluntarily participate in learning, and they differ in the provision of multi-dimensional resources, both of which might deteriorate the performance of federated learning.
no code implementations • 22 Feb 2020 • Zhenheng Tang, Shaohuai Shi, Xiaowen Chu
2) Each worker only needs to communicate with a single peer at each communication round with a highly compressed model, which can significantly reduce the communication traffic on the worker.
1 code implementation • 18 Feb 2020 • Shuoheng Yang, Yuxin Wang, Xiaowen Chu
In recent years, natural language processing (NLP) has got great development with deep learning techniques.
1 code implementation • 20 Dec 2019 • Qiang Wang, Shizhen Zheng, Qingsong Yan, Fei Deng, Kaiyong Zhao, Xiaowen Chu
Besides, we present DTN-Net, a two-stage deep model for surface normal estimation.
1 code implementation • 18 Dec 2019 • Shaohuai Shi, Xiaowen Chu, Bo Li
Distributed synchronous stochastic gradient descent has been widely used to train deep neural networks (DNNs) on computer clusters.
1 code implementation • 20 Nov 2019 • Shaohuai Shi, Xiaowen Chu, Ka Chun Cheung, Simon See
Distributed stochastic gradient descent (SGD) algorithms are widely deployed in training large-scale deep learning models, while the communication overhead among workers becomes the new system bottleneck.
no code implementations • 20 Nov 2019 • Shaohuai Shi, Zhenheng Tang, Qiang Wang, Kaiyong Zhao, Xiaowen Chu
To reduce the long training time of large deep neural network (DNN) models, distributed synchronous stochastic gradient descent (S-SGD) is commonly used on a cluster of workers.
no code implementations • 20 Nov 2019 • Xin He, Shihao Wang, Shaohuai Shi, Zhenheng Tang, Yuxin Wang, Zhihao Zhao, Jing Dai, Ronghao Ni, Xiaofeng Zhang, Xiaoming Liu, Zhili Wu, Wu Yu, Xiaowen Chu
Our results show that object detection can help improve the accuracy of some skin disease classes.
no code implementations • 15 Sep 2019 • Yuxin Wang, Qiang Wang, Shaohuai Shi, Xin He, Zhenheng Tang, Kaiyong Zhao, Xiaowen Chu
Different from the existing end-to-end benchmarks which only present the training time, We try to investigate the impact of hardware, vendor's software library, and deep learning framework on the performance and energy consumption of AI training.
2 code implementations • 2 Aug 2019 • Xin He, Kaiyong Zhao, Xiaowen Chu
Deep learning (DL) techniques have penetrated all aspects of our lives and brought us great convenience.
1 code implementation • 14 Jan 2019 • Shaohuai Shi, Qiang Wang, Kaiyong Zhao, Zhenheng Tang, Yuxin Wang, Xiang Huang, Xiaowen Chu
Current methods that use AllGather to accumulate the sparse gradients have a communication complexity of $O(kP)$, where $P$ is the number of workers, which is inefficient on low bandwidth networks with a large number of workers.
2 code implementations • 27 Nov 2018 • Shaohuai Shi, Xiaowen Chu, Bo Li
Distributed synchronous stochastic gradient descent has been widely used to train deep neural networks on computer clusters.
Distributed, Parallel, and Cluster Computing
no code implementations • 30 Jul 2018 • Xianyan Jia, Shutao Song, wei he, Yangzihao Wang, Haidong Rong, Feihu Zhou, Liqiang Xie, Zhenyu Guo, Yuanzhou Yang, Liwei Yu, Tiegang Chen, Guangxiao Hu, Shaohuai Shi, Xiaowen Chu
(3) We propose highly optimized all-reduce algorithms that achieve up to 3x and 11x speedup on AlexNet and ResNet-50 respectively than NCCL-based training on a cluster with 1024 Tesla P40 GPUs.
1 code implementation • 16 Nov 2017 • Shaohuai Shi, Xiaowen Chu
Deep learning frameworks have been widely deployed on GPU servers for deep learning applications in both academia and industry.
Distributed, Parallel, and Cluster Computing
no code implementations • 9 Nov 2017 • Pengfei Xu, Shaohuai Shi, Xiaowen Chu
We first benchmark the performance of system components (IO, CPU and GPU) in a docker container and the host system and compare the results to see if there's any difference.
no code implementations • 25 Apr 2017 • Shaohuai Shi, Xiaowen Chu
Rectifier neuron units (ReLUs) have been widely used in deep convolutional networks.
no code implementations • 10 Feb 2017 • Shaohuai Shi, Pengfei Xu, Xiaowen Chu
In this paper, we target at optimizing the operations of multiplying a matrix with the transpose of another matrix (referred to as NT operation hereafter), which contribute about half of the training time of fully connected deep neural networks.
no code implementations • 25 Aug 2016 • Shaohuai Shi, Qiang Wang, Pengfei Xu, Xiaowen Chu
We first benchmark the running performance of these tools with three popular types of neural networks on two CPU platforms and three GPU platforms.
1 code implementation • 8 Sep 2015 • Xinxin Mei, Xiaowen Chu
Memory access efficiency is a key factor in fully utilizing the computational power of graphics processing units (GPUs).
Hardware Architecture Distributed, Parallel, and Cluster Computing