Search Results for author: Zhenheng Tang

Found 14 papers, 7 papers with code

VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting

1 code implementation • 25 Mar 2024 • Yujin Tang, Peijie Dong, Zhenheng Tang, Xiaowen Chu, Junwei Liang

Combining CNNs or ViTs, with RNNs for spatiotemporal forecasting, has yielded unparalleled results in predicting temporal and spatial dynamics.

Paper
Code

FedImpro: Measuring and Improving Client Update in Federated Learning

no code implementations • 10 Feb 2024 • Zhenheng Tang, Yonggang Zhang, Shaohuai Shi, Xinmei Tian, Tongliang Liu, Bo Han, Xiaowen Chu

First, we analyze the generalization contribution of local training and conclude that this generalization contribution is bounded by the conditional Wasserstein distance between the data distribution of different clients.

Federated Learning

Paper
Add Code

FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs

no code implementations • 3 Sep 2023 • Zhenheng Tang, Yuxin Wang, Xin He, Longteng Zhang, Xinglin Pan, Qiang Wang, Rongfei Zeng, Kaiyong Zhao, Shaohuai Shi, Bingsheng He, Xiaowen Chu

The rapid growth of memory and computation requirements of large language models (LLMs) has outpaced the development of hardware, hindering people who lack large-scale high-end GPUs from training or deploying LLMs.

Scheduling

Paper
Add Code

FedML Parrot: A Scalable Federated Learning System via Heterogeneity-aware Scheduling on Sequential and Hierarchical Training

1 code implementation • 3 Mar 2023 • Zhenheng Tang, Xiaowen Chu, Ryan Yide Ran, Sunwoo Lee, Shaohuai Shi, Yonggang Zhang, Yuxin Wang, Alex Qiaozhong Liang, Salman Avestimehr, Chaoyang He

It improves the training efficiency, remarkably relaxes the requirements on the hardware, and supports efficient large-scale FL experiments with stateful clients by: (1) sequential training clients on devices; (2) decomposing original aggregation into local and global aggregation on devices and server respectively; (3) scheduling tasks to mitigate straggler problems and enhance computing utility; (4) distributed client state manager to support various FL algorithms.

Federated Learning Scheduling

4,057

Paper
Code

NAS-LID: Efficient Neural Architecture Search with Local Intrinsic Dimension

1 code implementation • 23 Nov 2022 • Xin He, Jiangchao Yao, Yuxin Wang, Zhenheng Tang, Ka Chu Cheung, Simon See, Bo Han, Xiaowen Chu

One-shot neural architecture search (NAS) substantially improves the search efficiency by training one supernet to estimate the performance of every possible child architecture (i. e., subnet).

Neural Architecture Search

Paper
Code

Virtual Homogeneity Learning: Defending against Data Heterogeneity in Federated Learning

1 code implementation • 6 Jun 2022 • Zhenheng Tang, Yonggang Zhang, Shaohuai Shi, Xin He, Bo Han, Xiaowen Chu

In federated learning (FL), model performance typically suffers from client drift induced by data heterogeneity, and mainstream works focus on correcting client drift.

Federated Learning

Paper
Code

FedCV: A Federated Learning Framework for Diverse Computer Vision Tasks

1 code implementation • 22 Nov 2021 • Chaoyang He, Alay Dilipbhai Shah, Zhenheng Tang, Di Fan1Adarshan Naiynar Sivashunmugam, Keerti Bhogaraju, Mita Shimpi, Li Shen, Xiaowen Chu, Mahdi Soltanolkotabi, Salman Avestimehr

To bridge the gap and facilitate the development of FL for computer vision tasks, in this work, we propose a federated learning library and benchmarking framework, named FedCV, to evaluate FL on the three most representative computer vision tasks: image classification, image segmentation, and object detection.

Benchmarking Federated Learning +5

Paper
Code

A Quantitative Survey of Communication Optimizations in Distributed Deep Learning

1 code implementation • 27 May 2020 • Shaohuai Shi, Zhenheng Tang, Xiaowen Chu, Chengjian Liu, Wei Wang, Bo Li

In this article, we present a quantitative survey of communication optimization techniques for data parallel distributed DL.

Scheduling

Paper
Code

Communication-Efficient Distributed Deep Learning: A Comprehensive Survey

no code implementations • 10 Mar 2020 • Zhenheng Tang, Shaohuai Shi, Wei Wang, Bo Li, Xiaowen Chu

In this paper, we provide a comprehensive survey of the communication-efficient distributed training algorithms, focusing on both system-level and algorithmic-level optimizations.

Paper
Add Code

Communication-Efficient Decentralized Learning with Sparsification and Adaptive Peer Selection

no code implementations • 22 Feb 2020 • Zhenheng Tang, Shaohuai Shi, Xiaowen Chu

2) Each worker only needs to communicate with a single peer at each communication round with a highly compressed model, which can significantly reduce the communication traffic on the worker.

Federated Learning

Paper
Add Code

Computer-Aided Clinical Skin Disease Diagnosis Using CNN and Object Detection Models

no code implementations • 20 Nov 2019 • Xin He, Shihao Wang, Shaohuai Shi, Zhenheng Tang, Yuxin Wang, Zhihao Zhao, Jing Dai, Ronghao Ni, Xiaofeng Zhang, Xiaoming Liu, Zhili Wu, Wu Yu, Xiaowen Chu

Our results show that object detection can help improve the accuracy of some skin disease classes.

object-detection Object Detection

Paper
Add Code

Layer-wise Adaptive Gradient Sparsification for Distributed Deep Learning with Convergence Guarantees

no code implementations • 20 Nov 2019 • Shaohuai Shi, Zhenheng Tang, Qiang Wang, Kaiyong Zhao, Xiaowen Chu

To reduce the long training time of large deep neural network (DNN) models, distributed synchronous stochastic gradient descent (S-SGD) is commonly used on a cluster of workers.

Distributed Optimization

Paper
Add Code

Benchmarking the Performance and Energy Efficiency of AI Accelerators for AI Training

no code implementations • 15 Sep 2019 • Yuxin Wang, Qiang Wang, Shaohuai Shi, Xin He, Zhenheng Tang, Kaiyong Zhao, Xiaowen Chu

Different from the existing end-to-end benchmarks which only present the training time, We try to investigate the impact of hardware, vendor's software library, and deep learning framework on the performance and energy consumption of AI training.

Benchmarking

Paper
Add Code

A Distributed Synchronous SGD Algorithm with Global Top-$k$ Sparsification for Low Bandwidth Networks

1 code implementation • 14 Jan 2019 • Shaohuai Shi, Qiang Wang, Kaiyong Zhao, Zhenheng Tang, Yuxin Wang, Xiang Huang, Xiaowen Chu

Current methods that use AllGather to accumulate the sparse gradients have a communication complexity of $O(kP)$, where $P$ is the number of workers, which is inefficient on low bandwidth networks with a large number of workers.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.