Search Results for author: Chuan Wu

Found 38 papers, 13 papers with code

MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards

no code implementations1 Oct 2024 Sheng Wang, Liheng Chen, Pengan Chen, Jingwei Dong, Boyang Xue, Jiyue Jiang, Lingpeng Kong, Chuan Wu

The rapid scaling of large language models necessitates more lightweight finetuning methods to reduce the explosive GPU memory overhead when numerous customized models are served simultaneously.

HybridFlow: A Flexible and Efficient RLHF Framework

no code implementations28 Sep 2024 Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, Chuan Wu

Traditional RL can be modeled as a dataflow, where each node represents computation of a neural network (NN) and each edge denotes data dependencies between the NNs.

Large Language Model

How Far Can Cantonese NLP Go? Benchmarking Cantonese Capabilities of Large Language Models

no code implementations29 Aug 2024 Jiyue Jiang, Liheng Chen, Pengan Chen, Sheng Wang, Qinghang Bao, Lingpeng Kong, Yu Li, Chuan Wu

The rapid evolution of large language models (LLMs) has transformed the competitive landscape in natural language processing (NLP), particularly for English and other data-rich languages.

Benchmarking General Knowledge

ByteCheckpoint: A Unified Checkpointing System for Large Foundation Model Development

no code implementations29 Jul 2024 Borui Wan, Mingji Han, Yiyao Sheng, Yanghua Peng, Haibin Lin, Mofan Zhang, Zhichao Lai, Menghan Yu, Junda Zhang, Zuquan Song, Xin Liu, Chuan Wu

In production, different LFMs are trained with various frameworks and storage backends, depending on model sizes and training scales.

QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices

1 code implementation2 Jul 2024 Juntao Zhao, Borui Wan, Yanghua Peng, Haibin Lin, Yibo Zhu, Chuan Wu

A number of production deep learning clusters have attempted to explore inference hardware for DNN training, at the off-peak serving hours with many inference GPUs idling.

Quantization

Data Augmentation of Multi-turn Psychological Dialogue via Knowledge-driven Progressive Thought Prompting

no code implementations24 Jun 2024 Jiyue Jiang, Liheng Chen, Sheng Wang, Lingpeng Kong, Yu Li, Chuan Wu

The thought generated by the progressive thought generator serves as a prompt to prevent the generated dialogue from having significant semantic deviations, while the psychology knowledge generator produces psychological knowledge to serve as the dialogue history for the LLM, guiding the dialogue generator to create multi-turn psychological dialogue.

Data Augmentation Dialogue Generation

BG-HGNN: Toward Scalable and Efficient Heterogeneous Graph Neural Network

no code implementations13 Mar 2024 Junwei Su, Lingjun Mao, Chuan Wu

Many computer vision and machine learning problems are modelled as learning tasks on heterogeneous graphs, featuring a wide array of relations from diverse types of nodes and edges.

Graph Neural Network Relation

Improving Implicit Regularization of SGD with Preconditioning for Least Square Problems

no code implementations13 Mar 2024 Junwei Su, Difan Zou, Chuan Wu

In this paper, we study the generalization performance of SGD with preconditioning for the least squared problem.

regression

On the Topology Awareness and Generalization Performance of Graph Neural Networks

no code implementations7 Mar 2024 Junwei Su, Chuan Wu

Many computer vision and machine learning problems are modelled as learning tasks on graphs where graph neural networks GNNs have emerged as a dominant tool for learning representations of graph structured data A key feature of GNNs is their use of graph structures as input enabling them to exploit the graphs inherent topological properties known as the topology awareness of GNNs Despite the empirical successes of GNNs the influence of topology awareness on generalization performance remains unexplored, particularly for node level tasks that diverge from the assumption of data being independent and identically distributed IID The precise definition and characterization of the topology awareness of GNNs especially concerning different topological features are still unclear This paper introduces a comprehensive framework to characterize the topology awareness of GNNs across any topological feature Using this framework we investigate the effects of topology awareness on GNN generalization performance Contrary to the prevailing belief that enhancing the topology awareness of GNNs is always advantageous our analysis reveals a critical insight improving the topology awareness of GNNs may inadvertently lead to unfair generalization across structural groups which might not be desired in some scenarios Additionally we conduct a case study using the intrinsic graph metric the shortest path distance on various benchmark datasets The empirical results of this case study confirm our theoretical insights Moreover we demonstrate the practical applicability of our framework by using it to tackle the cold start problem in graph active learning

Active Learning

LoRA Meets Dropout under a Unified Framework

no code implementations25 Feb 2024 Sheng Wang, Liheng Chen, Jiyue Jiang, Boyang Xue, Lingpeng Kong, Chuan Wu

Hence, a possible contradiction arises from negligible trainable parameters of LoRA and the effectiveness of previous dropout methods, which has been largely overlooked.

PRoLoRA: Partial Rotation Empowers More Parameter-Efficient LoRA

1 code implementation24 Feb 2024 Sheng Wang, Boyang Xue, Jiacheng Ye, Jiyue Jiang, Liheng Chen, Lingpeng Kong, Chuan Wu

Hopefully, the conspicuously higher parameter efficiency can establish PRoLoRA as a resource-friendly alternative to LoRA.

MSPipe: Efficient Temporal GNN Training via Staleness-Aware Pipeline

1 code implementation23 Feb 2024 Guangming Sheng, Junwei Su, Chao Huang, Chuan Wu

However, the iterative reading and updating process of the memory module in MTGNNs to obtain up-to-date information needs to follow the temporal dependencies.

Scheduling

Towards Robust Graph Incremental Learning on Evolving Graphs

no code implementations20 Feb 2024 Junwei Su, Difan Zou, Zijun Zhang, Chuan Wu

We provide a formal formulation and analysis of the problem, and propose a novel regularization-based technique called Structural-Shift-Risk-Mitigation (SSRM) to mitigate the impact of the structural shift on catastrophic forgetting of the inductive NGIL problem.

Incremental Learning

Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models

1 code implementation12 Feb 2024 Jiacheng Ye, Shansan Gong, Liheng Chen, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Xin Jiang, Zhenguo Li, Wei Bi, Lingpeng Kong

Recently, diffusion models have garnered significant interest in the field of text processing due to their many potential advantages compared to conventional autoregressive models.

Language Modelling Math

PRES: Toward Scalable Memory-Based Dynamic Graph Neural Networks

no code implementations6 Feb 2024 Junwei Su, Difan Zou, Chuan Wu

Memory-based Dynamic Graph Neural Networks (MDGNNs) are a family of dynamic graph neural networks that leverage a memory module to extract, distill, and memorize long-term temporal dependencies, leading to superior performance compared to memory-less counterparts.

GNNFlow: A Distributed Framework for Continuous Temporal GNN Learning on Dynamic Graphs

1 code implementation29 Nov 2023 Yuchen Zhong, Guangming Sheng, Tianzuo Qin, Minjie Wang, Quan Gan, Chuan Wu

We introduce GNNFlow, a distributed framework that enables efficient continuous temporal graph representation learning on dynamic graphs on multi-GPU machines.

Graph Learning Graph Representation Learning +1

DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines

2 code implementations17 Nov 2023 Chenyu Jiang, Zhen Jia, Shuai Zheng, Yida Wang, Chuan Wu

This paper proposes a dynamic micro-batching approach to tackle sequence length variation and enable efficient multi-task model training.

Language Modelling Large Language Model +3

CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs

1 code implementation16 Nov 2023 Hanpeng Hu, Junwei Su, Juntao Zhao, Yanghua Peng, Yibo Zhu, Haibin Lin, Chuan Wu

Considering the large space of DNN models and devices that impede direct profiling of all combinations, recent efforts focus on building a predictor to model the performance of DNN models on different devices.

Domain Adaptation

Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training

1 code implementation2 Jun 2023 Borui Wan, Juntao Zhao, Chuan Wu

Distributed full-graph training of Graph Neural Networks (GNNs) over large graphs is bandwidth-demanding and time-consuming.

Quantization

A Cognitive Stimulation Dialogue System with Multi-source Knowledge Fusion for Elders with Cognitive Impairment

no code implementations14 May 2023 Jiyue Jiang, Sheng Wang, Qintong Li, Lingpeng Kong, Chuan Wu

In this paper, we propose a multi-source knowledge fusion method for CS dialogue (CSD), to generate open-ended responses guided by the CS principle and emotional support strategy.

Decoder

Auto-Parallelizing Large Models with Rhino: A Systematic Approach on Production AI Platform

no code implementations16 Feb 2023 Shiwei Zhang, Lansong Diao, Siyu Wang, Zongyan Cao, Yiliang Gu, Chang Si, Ziji Shi, Zhen Zheng, Chuan Wu, Wei Lin

We present Rhino, a system for accelerating tensor programs with automatic parallelization on AI platform for real production environment.

Expediting Distributed DNN Training with Device Topology-Aware Graph Deployment

no code implementations13 Feb 2023 Shiwei Zhang, Xiaodong Yi, Lansong Diao, Chuan Wu, Siyu Wang, Wei Lin

This paper presents TAG, an automatic system to derive optimized DNN training graph and its deployment onto any device topology, for expedited training in device- and topology- heterogeneous ML clusters.

Combinatorial Optimization Graph Neural Network +1

On the Limitation and Experience Replay for GNNs in Continual Learning

no code implementations7 Feb 2023 Junwei Su, Difan Zou, Chuan Wu

Recent research has witnessed a surge in the exploration of Graph Neural Networks (GNN) in Node-wise Graph Continual Learning (NGCL), a practical yet challenging paradigm involving the continual training of a GNN on node-related tasks.

Continual Learning Incremental Learning +1

MixSynthFormer: A Transformer Encoder-like Structure with Mixed Synthetic Self-attention for Efficient Human Pose Estimation

no code implementations ICCV 2023 Yuran Sun, Alan William Dougherty, Zhuoying Zhang, Yi King Choi, Chuan Wu

Human pose estimation in videos has wide-ranging practical applications across various fields, many of which require fast inference on resource-scarce devices, necessitating the development of efficient and accurate algorithms.

3D Pose Estimation motion prediction +1

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

no code implementations5 May 2022 Hanpeng Hu, Chenyu Jiang, Yuchen Zhong, Yanghua Peng, Chuan Wu, Yibo Zhu, Haibin Lin, Chuanxiong Guo

Distributed training using multiple devices (e. g., GPUs) has been widely adopted for learning DNN models over large datasets.

BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing

no code implementations16 Dec 2021 Tianfeng Liu, Yangrui Chen, Dan Li, Chuan Wu, Yibo Zhu, Jun He, Yanghua Peng, Hongzheng Chen, Hongzhi Chen, Chuanxiong Guo

Extensive experiments on various GNN models and large graph datasets show that BGL significantly outperforms existing GNN training systems by 20. 68x on average.

Graph Property Prediction Node Classification +1

Adversarial Deep Learning for Online Resource Allocation

no code implementations19 Nov 2021 Bingqian Du, Zhiyi Huang, Chuan Wu

Inspired by adversarial training from Generative Adversarial Net (GAN) and the fact that competitive ratio of an online algorithm is based on worst-case input, we adopt deep neural networks to learn an online algorithm for a resource allocation and pricing problem from scratch, with the goal that the performance gap between offline optimum and the learned online algorithm can be minimized for worst-case input.

Decision Making

OneFlow: Redesign the Distributed Deep Learning Framework from Scratch

1 code implementation28 Oct 2021 Jinhui Yuan, Xinqi Li, Cheng Cheng, Juncheng Liu, Ran Guo, Shenghang Cai, Chi Yao, Fei Yang, Xiaodong Yi, Chuan Wu, Haoran Zhang, Jie Zhao

Aiming at a simple, neat redesign of distributed deep learning frameworks for various parallelism paradigms, we present OneFlow, a novel distributed training framework based on an SBP (split, broadcast and partial-value) abstraction and the actor model.

On Locality in Graph Learning via Graph Neural Network

no code implementations29 Sep 2021 Junwei Su, Jiaqi Han, Chuan Wu

In this paper, we study how the training set in the input graph effects the performance of GNN.

Active Learning Graph Learning +1

WN-Salience: A Corpus of News Articles with Entity Salience Annotations

no code implementations LREC 2020 Chuan Wu, Evangelos Kanoulas, Maarten de Rijke, Wei Lu

To support research on entity salience, we present a new dataset, the WikiNews Salience dataset (WN-Salience), which can be used to benchmark tasks such as entity salience detection and salient entity linking.

Entity Linking

Distributed Machine Learning through Heterogeneous Edge Systems

no code implementations16 Nov 2019 Hanpeng Hu, Dan Wang, Chuan Wu

Many emerging AI applications request distributed machine learning (ML) among edge systems (e. g., IoT devices and PCs at the edge of the Internet), where data cannot be uploaded to a central venue for model training, due to their large volumes and/or security/privacy concerns.

BIG-bench Machine Learning

Characterizing Deep Learning Training Workloads on Alibaba-PAI

no code implementations14 Oct 2019 Mengdi Wang, Chen Meng, Guoping Long, Chuan Wu, Jun Yang, Wei. Lin, Yangqing Jia

One critical issue for efficiently operating practical AI clouds, is to characterize the computing and data transfer demands of these workloads, and more importantly, the training performance given the underlying software framework and hardware configurations.

DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters

1 code implementation13 Sep 2019 Yanghua Peng, Yixin Bao, Yangrui Chen, Chuan Wu, Chen Meng, Wei. Lin

DL2 is a DL-driven scheduler for DL clusters, targeting global training job expedition by dynamically resizing resources allocated to jobs.

Fairness reinforcement-learning +3

Online Job Scheduling in Distributed Machine Learning Clusters

no code implementations3 Jan 2018 Yixin Bao, Yanghua Peng, Chuan Wu, Zongpeng Li

In a shared cluster handling multiple training jobs, a fundamental issue is how to efficiently schedule jobs and set the number of concurrent workers to run for each job, such that server resources are maximally utilized and model training can be completed in time.

Distributed, Parallel, and Cluster Computing

Normalized Direction-preserving Adam

1 code implementation ICLR 2018 Zijun Zhang, Lin Ma, Zongpeng Li, Chuan Wu

Adaptive optimization algorithms, such as Adam and RMSprop, have shown better optimization performance than stochastic gradient descent (SGD) in some scenarios.

General Classification

Online Influence Maximization in Non-Stationary Social Networks

1 code implementation26 Apr 2016 Yixin Bao, Xiaoke Wang, Zhi Wang, Chuan Wu, Francis C. M. Lau

Nevertheless, the existing studies mostly investigate the problem on a one-off basis, assuming fixed known influence probabilities among users, or the knowledge of the exact social network topology.

Marketing

Cannot find the paper you are looking for? You can Submit a new open access paper.