1 code implementation • 29 Mar 2024 • Bowen Lei, Dongkuan Xu, Ruqi Zhang, Bani Mallick
Sparse training has emerged as a promising method for resource-efficient deep neural networks (DNNs) in real-world applications.
no code implementations • 7 Mar 2024 • Xinpeng Wang, Shitong Duan, Xiaoyuan Yi, Jing Yao, Shanlin Zhou, Zhihua Wei, Peng Zhang, Dongkuan Xu, Maosong Sun, Xing Xie
Big models have achieved revolutionary breakthroughs in the field of AI, but they might also pose potential concerns.
no code implementations • 29 Feb 2024 • Xukun Liu, Zhiyuan Peng, Xiaoyuan Yi, Xing Xie, Lirong Xiang, Yuchen Liu, Dongkuan Xu
While achieving remarkable progress in a broad range of tasks, large language models (LLMs) remain significantly limited in properly using massive external tools.
no code implementations • 17 Dec 2023 • Zhengdong Zhang, Zihan Dong, Yang Shi, Noboru Matsuda, Thomas Price, Dongkuan Xu
This study demonstrated that ChatGPT could generate Java programming assignment feedback that students perceived as formative.
no code implementations • 10 Dec 2023 • Jianwei Li, Tianchi Zhang, Ian En-Hsu Yen, Dongkuan Xu
Transformer-based models, such as BERT, have been widely applied in a wide range of natural language processing tasks.
no code implementations • 19 Oct 2023 • Jianwei Li, Qi Lei, Wei Cheng, Dongkuan Xu
The pruning objective has recently extended beyond accuracy and sparsity to robustness in language models.
no code implementations • 19 Oct 2023 • Jianwei Li, Weizhi Gao, Qi Lei, Dongkuan Xu
It is widely acknowledged that large and sparse models have higher accuracy than small and dense models under the same model size constraints.
no code implementations • 29 Sep 2023 • Shengkun Tang, Yaqing Wang, Caiwen Ding, Yi Liang, Yao Li, Dongkuan Xu
In this work, we propose DeeDiff, an early exiting framework that adaptively allocates computation resources in each sampling step to improve the generation efficiency of diffusion models.
1 code implementation • 8 Aug 2023 • Binfeng Xu, Xukun Liu, Hua Shen, Zeyu Han, Yuhan Li, Murong Yue, Zhiyuan Peng, Yuchen Liu, Ziyu Yao, Dongkuan Xu
We present gentopia, an ALM framework enabling flexible customization of agents through simple configurations, seamlessly integrating various language models, task formats, prompting modules, and plugins into a unified paradigm.
1 code implementation • ICCV 2023 • Dongyao Zhu, Bowen Lei, Jie Zhang, Yanbo Fang, Ruqi Zhang, Yiqun Xie, Dongkuan Xu
Neural networks trained on distilled data often produce over-confident output and require correction by calibration methods.
1 code implementation • 19 Jul 2023 • Longfeng Wu, Bowen Lei, Dongkuan Xu, Dawei Zhou
In particular, to quantify the uncertainties in RCA, we develop a node-level uncertainty quantification algorithm to model the overlapping support regions with high uncertainty; to handle the rarity of minority classes in miscalibration calculation, we generalize the distribution-based calibration metric to the instance level and propose the first individual calibration measurement on graphs named Expected Individual Calibration Error (EICE).
1 code implementation • 1 Jul 2023 • Ziqing Wang, Qidong Zhao, Jinku Cui, Xu Liu, Dongkuan Xu
To address these limitations, we introduce AutoST, a training-free NAS method for Spiking Transformers, to rapidly identify high-performance Spiking Transformer architectures.
2 code implementations • 23 May 2023 • Binfeng Xu, Zhiyuan Peng, Bowen Lei, Subhabrata Mukherjee, Yuchen Liu, Dongkuan Xu
Augmented Language Models (ALMs) blend the reasoning capabilities of Large Language Models (LLMs) with tools that allow for knowledge retrieval and action execution.
no code implementations • 24 Apr 2023 • Shaoyi Huang, Haowen Fang, Kaleel Mahmood, Bowen Lei, Nuo Xu, Bin Lei, Yue Sun, Dongkuan Xu, Wujie Wen, Caiwen Ding
Experimental results show that NDSNN achieves up to 20. 52\% improvement in accuracy on Tiny-ImageNet using ResNet-19 (with a sparsity of 99\%) as compared to other SOTA methods (e. g., Lottery Ticket Hypothesis (LTH), SET-SNN, RigL-SNN).
1 code implementation • 21 Mar 2023 • Dongsheng Luo, Wei Cheng, Yingheng Wang, Dongkuan Xu, Jingchao Ni, Wenchao Yu, Xuchao Zhang, Yanchi Liu, Yuncong Chen, Haifeng Chen, Xiang Zhang
A key component of contrastive learning is to select appropriate augmentations imposing some priors to construct feasible positive samples, such that an encoder can be trained to learn robust and discriminative representations.
no code implementations • 27 Feb 2023 • Yue Xiang, Dongyao Zhu, Bowen Lei, Dongkuan Xu, Ruqi Zhang
Gradients have been exploited in proposal distributions to accelerate the convergence of Markov chain Monte Carlo algorithms on discrete distributions.
1 code implementation • 18 Feb 2023 • Bowen Lei, Ruqi Zhang, Dongkuan Xu, Bani Mallick
Previous research has shown that deep neural networks tend to be over-confident, and we find that sparse training exacerbates this problem.
1 code implementation • 9 Jan 2023 • Bowen Lei, Dongkuan Xu, Ruqi Zhang, Shuren He, Bani K. Mallick
To accelerate and stabilize the convergence of sparse training, we analyze the gradient changes and develop an adaptive gradient correction method.
2 code implementations • CVPR 2023 • Lei Zhang, Jie Zhang, Bowen Lei, Subhabrata Mukherjee, Xiang Pan, Bo Zhao, Caiwen Ding, Yao Li, Dongkuan Xu
Dataset Distillation (DD), a newly emerging field, aims at generating much smaller but efficient synthetic training datasets from large ones.
no code implementations • 30 Nov 2022 • Shaoyi Huang, Bowen Lei, Dongkuan Xu, Hongwu Peng, Yue Sun, Mimi Xie, Caiwen Ding
We further design an acquisition function and provide the theoretical guarantees for the proposed method and clarify its convergence property.
1 code implementation • CVPR 2023 • Shengkun Tang, Yaqing Wang, Zhenglun Kong, Tianchi Zhang, Yao Li, Caiwen Ding, Yanzhi Wang, Yi Liang, Dongkuan Xu
To handle this challenge, we propose a novel early exiting strategy for unified visual language models, which allows dynamically skip the layers in encoder and decoder simultaneously in term of input layer-wise similarities with multiple times of early exiting, namely \textbf{MuE}.
no code implementations • 15 Nov 2022 • Qin Zhang, Shangsi Chen, Dongkuan Xu, Qingqing Cao, Xiaojun Chen, Trevor Cohn, Meng Fang
Thus, a trade-off between accuracy, memory consumption and processing speed is pursued.
no code implementations • 16 Jul 2022 • Ian En-Hsu Yen, Zhibin Xiao, Dongkuan Xu
And the degree of sparsity one can exploit has become higher as larger model sizes have been considered along with the trend of pre-training giant models.
no code implementations • 21 Jun 2022 • Shaoyi Huang, Ning Liu, Yueying Liang, Hongwu Peng, Hongjia Li, Dongkuan Xu, Mimi Xie, Caiwen Ding
On MRPC, we obtain a 4. 6 higher score than the SOTA at the same overall pruning ratio of 0. 5.
no code implementations • 29 Jan 2022 • Dongkuan Xu, Subhabrata Mukherjee, Xiaodong Liu, Debadeepta Dey, Wenhui Wang, Xiang Zhang, Ahmed Hassan Awadallah, Jianfeng Gao
Our framework AutoDistil addresses above challenges with the following steps: (a) Incorporates inductive bias and heuristics to partition Transformer search space into K compact sub-spaces (K=3 for typical student sizes of base, small and tiny); (b) Trains one SuperLM for each sub-space using task-agnostic objective (e. g., self-attention distillation) with weight-sharing of students; (c) Lightweight search for the optimal student without re-training.
no code implementations • NeurIPS 2021 • Dongkuan Xu, Wei Cheng, Dongsheng Luo, Haifeng Chen, Xiang Zhang
The key point of this framework is to follow the Information Bottleneck principle to reduce the mutual information between contrastive parts while keeping task-relevant information intact at both the levels of the individual module and the entire framework so that the information loss during graph representation learning can be minimized.
no code implementations • ACL 2022 • Shaoyi Huang, Dongkuan Xu, Ian E. H. Yen, Yijue Wang, Sung-En Chang, Bingbing Li, Shiyang Chen, Mimi Xie, Sanguthevar Rajasekaran, Hang Liu, Caiwen Ding
Conventional wisdom in pruning Transformer-based language models is that pruning reduces the model expressiveness and thus is more likely to underfit rather than overfit.
no code implementations • 29 Sep 2021 • Dongsheng Luo, Wei Cheng, Yingheng Wang, Dongkuan Xu, Jingchao Ni, Wenchao Yu, Xuchao Zhang, Yanchi Liu, Haifeng Chen, Xiang Zhang
How to find the desired augmentations of time series data that are meaningful for given contrastive learning tasks and datasets remains an open question.
no code implementations • ACL 2021 • Xin Dong, Yaxin Zhu, Zuohui Fu, Dongkuan Xu, Gerard de Melo
Due to recent pretrained multilingual representation models, it has become feasible to exploit labeled data from one language to train a cross-lingual model that can then be applied to multiple new languages.
1 code implementation • NAACL 2021 • Dongkuan Xu, Ian E. H. Yen, Jinxi Zhao, Zhibin Xiao
In particular, common wisdom in pruning CNN states that sparse pruning technique compresses a model more than that obtained by reducing number of channels and layers (Elsen et al., 2020; Zhu and Gupta, 2017), while existing works on sparse pruning of BERT yields inferior results than its small-dense counterparts such as TinyBERT (Jiao et al., 2020).
3 code implementations • NeurIPS 2020 • Dongsheng Luo, Wei Cheng, Dongkuan Xu, Wenchao Yu, Bo Zong, Haifeng Chen, Xiang Zhang
The unique explanation interpreting each instance independently is not sufficient to provide a global understanding of the learned GNN model, leading to a lack of generalizability and hindering it from being used in the inductive setting.
no code implementations • 29 Jul 2020 • Xin Dong, Yaxin Zhu, Yupeng Zhang, Zuohui Fu, Dongkuan Xu, Sen yang, Gerard de Melo
The resulting model then serves as a teacher to induce labels for unlabeled target language samples that can be used during further adversarial training, allowing us to gradually adapt our model to the target language.
no code implementations • 24 May 2020 • Junjie Liang, Yanting Wu, Dongkuan Xu, Vasant Honavar
Specifically, L-DKGPR eliminates the need for ad hoc heuristics or trial and error using a novel adaptation of deep kernel learning that combines the expressive power of deep neural networks with the flexibility of non-parametric kernel methods.
no code implementations • 1 Mar 2020 • Hua Wei, Dongkuan Xu, Junjie Liang, Zhenhui Li
To the best of our knowledge, we are the first to learn to model the state transition of moving agents with system dynamics.
1 code implementation • 11 Nov 2019 • Junjie Liang, Dongkuan Xu, Yiwei Sun, Vasant Honavar
However, the current state-of-the-art methods are unable to select the most predictive fixed effects and random effects from a large number of variables, while accounting for complex correlation structure in the data and non-linear interactions among the variables.
no code implementations • 12 Dec 2016 • Dongkuan Xu, Jia Wu, Wei zhang, Yingjie Tian
To the end, we propose a positive instance detection via graph updating for multiple instance learning, called PIGMIL, to detect TPI accurately.