no code implementations • 1 Oct 2024 • Sheng Wang, Liheng Chen, Pengan Chen, Jingwei Dong, Boyang Xue, Jiyue Jiang, Lingpeng Kong, Chuan Wu
The rapid scaling of large language models necessitates more lightweight finetuning methods to reduce the explosive GPU memory overhead when numerous customized models are served simultaneously.
no code implementations • 28 Sep 2024 • Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, Chuan Wu
Traditional RL can be modeled as a dataflow, where each node represents computation of a neural network (NN) and each edge denotes data dependencies between the NNs.
no code implementations • 29 Aug 2024 • Jiyue Jiang, Liheng Chen, Pengan Chen, Sheng Wang, Qinghang Bao, Lingpeng Kong, Yu Li, Chuan Wu
The rapid evolution of large language models (LLMs) has transformed the competitive landscape in natural language processing (NLP), particularly for English and other data-rich languages.
no code implementations • 29 Jul 2024 • Borui Wan, Mingji Han, Yiyao Sheng, Yanghua Peng, Haibin Lin, Mofan Zhang, Zhichao Lai, Menghan Yu, Junda Zhang, Zuquan Song, Xin Liu, Chuan Wu
In production, different LFMs are trained with various frameworks and storage backends, depending on model sizes and training scales.
1 code implementation • 2 Jul 2024 • Juntao Zhao, Borui Wan, Yanghua Peng, Haibin Lin, Yibo Zhu, Chuan Wu
A number of production deep learning clusters have attempted to explore inference hardware for DNN training, at the off-peak serving hours with many inference GPUs idling.
no code implementations • 24 Jun 2024 • Jiyue Jiang, Liheng Chen, Sheng Wang, Lingpeng Kong, Yu Li, Chuan Wu
The thought generated by the progressive thought generator serves as a prompt to prevent the generated dialogue from having significant semantic deviations, while the psychology knowledge generator produces psychological knowledge to serve as the dialogue history for the LLM, guiding the dialogue generator to create multi-turn psychological dialogue.
no code implementations • 30 Apr 2024 • Chenyu Jiang, Ye Tian, Zhen Jia, Shuai Zheng, Chuan Wu, Yida Wang
The Mixture-of-Expert (MoE) technique plays a crucial role in expanding the size of DNN model parameters.
no code implementations • 13 Mar 2024 • Junwei Su, Lingjun Mao, Chuan Wu
Many computer vision and machine learning problems are modelled as learning tasks on heterogeneous graphs, featuring a wide array of relations from diverse types of nodes and edges.
no code implementations • 13 Mar 2024 • Junwei Su, Difan Zou, Chuan Wu
In this paper, we study the generalization performance of SGD with preconditioning for the least squared problem.
no code implementations • 7 Mar 2024 • Junwei Su, Chuan Wu
Many computer vision and machine learning problems are modelled as learning tasks on graphs where graph neural networks GNNs have emerged as a dominant tool for learning representations of graph structured data A key feature of GNNs is their use of graph structures as input enabling them to exploit the graphs inherent topological properties known as the topology awareness of GNNs Despite the empirical successes of GNNs the influence of topology awareness on generalization performance remains unexplored, particularly for node level tasks that diverge from the assumption of data being independent and identically distributed IID The precise definition and characterization of the topology awareness of GNNs especially concerning different topological features are still unclear This paper introduces a comprehensive framework to characterize the topology awareness of GNNs across any topological feature Using this framework we investigate the effects of topology awareness on GNN generalization performance Contrary to the prevailing belief that enhancing the topology awareness of GNNs is always advantageous our analysis reveals a critical insight improving the topology awareness of GNNs may inadvertently lead to unfair generalization across structural groups which might not be desired in some scenarios Additionally we conduct a case study using the intrinsic graph metric the shortest path distance on various benchmark datasets The empirical results of this case study confirm our theoretical insights Moreover we demonstrate the practical applicability of our framework by using it to tackle the cold start problem in graph active learning
1 code implementation • 2 Mar 2024 • Juntao Zhao, Borui Wan, Yanghua Peng, Haibin Lin, Chuan Wu
The immense sizes of LLMs have led to very high resource demand and cost for running the models.
no code implementations • 25 Feb 2024 • Sheng Wang, Liheng Chen, Jiyue Jiang, Boyang Xue, Lingpeng Kong, Chuan Wu
Hence, a possible contradiction arises from negligible trainable parameters of LoRA and the effectiveness of previous dropout methods, which has been largely overlooked.
1 code implementation • 24 Feb 2024 • Sheng Wang, Boyang Xue, Jiacheng Ye, Jiyue Jiang, Liheng Chen, Lingpeng Kong, Chuan Wu
Hopefully, the conspicuously higher parameter efficiency can establish PRoLoRA as a resource-friendly alternative to LoRA.
1 code implementation • 23 Feb 2024 • Guangming Sheng, Junwei Su, Chao Huang, Chuan Wu
However, the iterative reading and updating process of the memory module in MTGNNs to obtain up-to-date information needs to follow the temporal dependencies.
no code implementations • 20 Feb 2024 • Junwei Su, Difan Zou, Zijun Zhang, Chuan Wu
We provide a formal formulation and analysis of the problem, and propose a novel regularization-based technique called Structural-Shift-Risk-Mitigation (SSRM) to mitigate the impact of the structural shift on catastrophic forgetting of the inductive NGIL problem.
1 code implementation • 12 Feb 2024 • Jiacheng Ye, Shansan Gong, Liheng Chen, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Xin Jiang, Zhenguo Li, Wei Bi, Lingpeng Kong
Recently, diffusion models have garnered significant interest in the field of text processing due to their many potential advantages compared to conventional autoregressive models.
no code implementations • 6 Feb 2024 • Junwei Su, Difan Zou, Chuan Wu
Memory-based Dynamic Graph Neural Networks (MDGNNs) are a family of dynamic graph neural networks that leverage a memory module to extract, distill, and memorize long-term temporal dependencies, leading to superior performance compared to memory-less counterparts.
1 code implementation • 29 Nov 2023 • Yuchen Zhong, Guangming Sheng, Tianzuo Qin, Minjie Wang, Quan Gan, Chuan Wu
We introduce GNNFlow, a distributed framework that enables efficient continuous temporal graph representation learning on dynamic graphs on multi-GPU machines.
2 code implementations • 17 Nov 2023 • Chenyu Jiang, Zhen Jia, Shuai Zheng, Yida Wang, Chuan Wu
This paper proposes a dynamic micro-batching approach to tackle sequence length variation and enable efficient multi-task model training.
1 code implementation • 16 Nov 2023 • Hanpeng Hu, Junwei Su, Juntao Zhao, Yanghua Peng, Yibo Zhu, Haibin Lin, Chuan Wu
Considering the large space of DNN models and devices that impede direct profiling of all combinations, recent efforts focus on building a predictor to model the performance of DNN models on different devices.
1 code implementation • 2 Jun 2023 • Borui Wan, Juntao Zhao, Chuan Wu
Distributed full-graph training of Graph Neural Networks (GNNs) over large graphs is bandwidth-demanding and time-consuming.
no code implementations • 14 May 2023 • Jiyue Jiang, Sheng Wang, Qintong Li, Lingpeng Kong, Chuan Wu
In this paper, we propose a multi-source knowledge fusion method for CS dialogue (CSD), to generate open-ended responses guided by the CS principle and emotional support strategy.
no code implementations • 16 Feb 2023 • Shiwei Zhang, Lansong Diao, Siyu Wang, Zongyan Cao, Yiliang Gu, Chang Si, Ziji Shi, Zhen Zheng, Chuan Wu, Wei Lin
We present Rhino, a system for accelerating tensor programs with automatic parallelization on AI platform for real production environment.
no code implementations • 13 Feb 2023 • Shiwei Zhang, Xiaodong Yi, Lansong Diao, Chuan Wu, Siyu Wang, Wei Lin
This paper presents TAG, an automatic system to derive optimized DNN training graph and its deployment onto any device topology, for expedited training in device- and topology- heterogeneous ML clusters.
no code implementations • 7 Feb 2023 • Junwei Su, Difan Zou, Chuan Wu
Recent research has witnessed a surge in the exploration of Graph Neural Networks (GNN) in Node-wise Graph Continual Learning (NGCL), a practical yet challenging paradigm involving the continual training of a GNN on node-related tasks.
no code implementations • ICCV 2023 • Yuran Sun, Alan William Dougherty, Zhuoying Zhang, Yi King Choi, Chuan Wu
Human pose estimation in videos has wide-ranging practical applications across various fields, many of which require fast inference on resource-scarce devices, necessitating the development of efficient and accurate algorithms.
no code implementations • 5 May 2022 • Hanpeng Hu, Chenyu Jiang, Yuchen Zhong, Yanghua Peng, Chuan Wu, Yibo Zhu, Haibin Lin, Chuanxiong Guo
Distributed training using multiple devices (e. g., GPUs) has been widely adopted for learning DNN models over large datasets.
no code implementations • 16 Dec 2021 • Tianfeng Liu, Yangrui Chen, Dan Li, Chuan Wu, Yibo Zhu, Jun He, Yanghua Peng, Hongzheng Chen, Hongzhi Chen, Chuanxiong Guo
Extensive experiments on various GNN models and large graph datasets show that BGL significantly outperforms existing GNN training systems by 20. 68x on average.
no code implementations • 19 Nov 2021 • Bingqian Du, Zhiyi Huang, Chuan Wu
Inspired by adversarial training from Generative Adversarial Net (GAN) and the fact that competitive ratio of an online algorithm is based on worst-case input, we adopt deep neural networks to learn an online algorithm for a resource allocation and pricing problem from scratch, with the goal that the performance gap between offline optimum and the learned online algorithm can be minimized for worst-case input.
1 code implementation • 28 Oct 2021 • Jinhui Yuan, Xinqi Li, Cheng Cheng, Juncheng Liu, Ran Guo, Shenghang Cai, Chi Yao, Fei Yang, Xiaodong Yi, Chuan Wu, Haoran Zhang, Jie Zhao
Aiming at a simple, neat redesign of distributed deep learning frameworks for various parallelism paradigms, we present OneFlow, a novel distributed training framework based on an SBP (split, broadcast and partial-value) abstraction and the actor model.
no code implementations • 29 Sep 2021 • Junwei Su, Jiaqi Han, Chuan Wu
In this paper, we study how the training set in the input graph effects the performance of GNN.
no code implementations • LREC 2020 • Chuan Wu, Evangelos Kanoulas, Maarten de Rijke, Wei Lu
To support research on entity salience, we present a new dataset, the WikiNews Salience dataset (WN-Salience), which can be used to benchmark tasks such as entity salience detection and salient entity linking.
no code implementations • 16 Nov 2019 • Hanpeng Hu, Dan Wang, Chuan Wu
Many emerging AI applications request distributed machine learning (ML) among edge systems (e. g., IoT devices and PCs at the edge of the Internet), where data cannot be uploaded to a central venue for model training, due to their large volumes and/or security/privacy concerns.
no code implementations • 14 Oct 2019 • Mengdi Wang, Chen Meng, Guoping Long, Chuan Wu, Jun Yang, Wei. Lin, Yangqing Jia
One critical issue for efficiently operating practical AI clouds, is to characterize the computing and data transfer demands of these workloads, and more importantly, the training performance given the underlying software framework and hardware configurations.
1 code implementation • 13 Sep 2019 • Yanghua Peng, Yixin Bao, Yangrui Chen, Chuan Wu, Chen Meng, Wei. Lin
DL2 is a DL-driven scheduler for DL clusters, targeting global training job expedition by dynamically resizing resources allocated to jobs.
no code implementations • 3 Jan 2018 • Yixin Bao, Yanghua Peng, Chuan Wu, Zongpeng Li
In a shared cluster handling multiple training jobs, a fundamental issue is how to efficiently schedule jobs and set the number of concurrent workers to run for each job, such that server resources are maximally utilized and model training can be completed in time.
Distributed, Parallel, and Cluster Computing
1 code implementation • ICLR 2018 • Zijun Zhang, Lin Ma, Zongpeng Li, Chuan Wu
Adaptive optimization algorithms, such as Adam and RMSprop, have shown better optimization performance than stochastic gradient descent (SGD) in some scenarios.
1 code implementation • 26 Apr 2016 • Yixin Bao, Xiaoke Wang, Zhi Wang, Chuan Wu, Francis C. M. Lau
Nevertheless, the existing studies mostly investigate the problem on a one-off basis, assuming fixed known influence probabilities among users, or the knowledge of the exact social network topology.