no code implementations • 24 Jun 2025 • Travis Thompson, Seung-Hwan Lim, Paul Liu, Ruoying He, Dongkuan Xu
Large Language Models (LLMs) have achieved impressive capabilities in language understanding and generation, yet they continue to underperform on knowledge-intensive reasoning tasks due to limited access to structured context and multi-hop information.
no code implementations • 18 May 2025 • Shitong Duan, Xiaoyuan Yi, Peng Zhang, Dongkuan Xu, Jing Yao, Tun Lu, Ning Gu, Xing Xie
Assessing Large Language Models (LLMs)' underlying value differences enables comprehensive comparison of their misalignment, cultural adaptability, and biases.
1 code implementation • 26 Mar 2025 • Huanhuan Ma, Haisong Gong, Xiaoyuan Yi, Xing Xie, Dongkuan Xu
Through extensive experiments, we demonstrate that: 1) CSI effectively captures nuanced emotional patterns, revealing significant variation in LLMs across languages and contexts; 2) Compared to current approaches, CSI significantly improves reliability, yielding more consistent results; and 3) The correlation between CSI scores and the sentiment of LLM's real-world outputs exceeds 0. 85, demonstrating its strong validity in predicting LLM behavior.
1 code implementation • 18 Dec 2024 • ChengAo Shen, Zhengzhang Chen, Dongsheng Luo, Dongkuan Xu, Haifeng Chen, Jingchao Ni
The advent of Large Language Models (LLMs) has ushered in an affordable way of leveraging the semantic cues for knowledge-driven causal discovery, but the development of LLMs for causal discovery lags behind other areas, particularly in the exploration of multi-modal data.
no code implementations • 29 Jun 2024 • Zifan Zhang, Yuchen Liu, Zhiyuan Peng, Mingzhe Chen, Dongkuan Xu, Shuguang Cui
To bridge this gap, we introduce a novel digital twin-assisted optimization framework, called D-REC, which integrates reinforcement learning (RL) with diverse intervention modules to ensure reliable caching in nextG wireless networks.
no code implementations • 27 Jun 2024 • Xukun Liu, Bowen Lei, Ruqi Zhang, Dongkuan Xu
Large language model (LLM) decoding involves generating a sequence of tokens based on a given context, where each token is predicted one at a time using the model's learned probabilities.
1 code implementation • 7 Jun 2024 • Cong Zeng, Shengkun Tang, Xianjun Yang, Yuanzhou Chen, Yiyou Sun, Zhiqiang Xu, Yao Li, Haifeng Chen, Wei Cheng, Dongkuan Xu
However, these methods grapple with the misalignment between the distributions of the surrogate and the often undisclosed target models, leading to performance degradation, particularly with the introduction of new, closed-source models.
1 code implementation • 29 Mar 2024 • Bowen Lei, Dongkuan Xu, Ruqi Zhang, Bani Mallick
Sparse training has emerged as a promising method for resource-efficient deep neural networks (DNNs) in real-world applications.
no code implementations • 7 Mar 2024 • Xinpeng Wang, Shitong Duan, Xiaoyuan Yi, Jing Yao, Shanlin Zhou, Zhihua Wei, Peng Zhang, Dongkuan Xu, Maosong Sun, Xing Xie
Big models have achieved revolutionary breakthroughs in the field of AI, but they might also pose potential concerns.
no code implementations • 29 Feb 2024 • Xukun Liu, Zhiyuan Peng, Xiaoyuan Yi, Xing Xie, Lirong Xiang, Yuchen Liu, Dongkuan Xu
While achieving remarkable progress in a broad range of tasks, large language models (LLMs) remain significantly limited in properly using massive external tools.
no code implementations • 17 Dec 2023 • Zhengdong Zhang, Zihan Dong, Yang Shi, Noboru Matsuda, Thomas Price, Dongkuan Xu
This study demonstrated that ChatGPT could generate Java programming assignment feedback that students perceived as formative.
no code implementations • 10 Dec 2023 • Jianwei Li, Tianchi Zhang, Ian En-Hsu Yen, Dongkuan Xu
Transformer-based models, such as BERT, have been widely applied in a wide range of natural language processing tasks.
no code implementations • 19 Oct 2023 • Jianwei Li, Qi Lei, Wei Cheng, Dongkuan Xu
The pruning objective has recently extended beyond accuracy and sparsity to robustness in language models.
no code implementations • 19 Oct 2023 • Jianwei Li, Weizhi Gao, Qi Lei, Dongkuan Xu
It is widely acknowledged that large and sparse models have higher accuracy than small and dense models under the same model size constraints.
no code implementations • 29 Sep 2023 • Shengkun Tang, Yaqing Wang, Caiwen Ding, Yi Liang, Yao Li, Dongkuan Xu
Unlike typical adaptive computation challenges that deal with single-step generation problems, diffusion processes with a multi-step generation need to dynamically adjust their computational resource allocation based on the ongoing assessment of each step's importance to the final image output, presenting a unique set of challenges.
1 code implementation • 8 Aug 2023 • Binfeng Xu, Xukun Liu, Hua Shen, Zeyu Han, Yuhan Li, Murong Yue, Zhiyuan Peng, Yuchen Liu, Ziyu Yao, Dongkuan Xu
We present gentopia, an ALM framework enabling flexible customization of agents through simple configurations, seamlessly integrating various language models, task formats, prompting modules, and plugins into a unified paradigm.
1 code implementation • ICCV 2023 • Dongyao Zhu, Bowen Lei, Jie Zhang, Yanbo Fang, Ruqi Zhang, Yiqun Xie, Dongkuan Xu
Neural networks trained on distilled data often produce over-confident output and require correction by calibration methods.
1 code implementation • 19 Jul 2023 • Longfeng Wu, Bowen Lei, Dongkuan Xu, Dawei Zhou
In particular, to quantify the uncertainties in RCA, we develop a node-level uncertainty quantification algorithm to model the overlapping support regions with high uncertainty; to handle the rarity of minority classes in miscalibration calculation, we generalize the distribution-based calibration metric to the instance level and propose the first individual calibration measurement on graphs named Expected Individual Calibration Error (EICE).
1 code implementation • 1 Jul 2023 • Ziqing Wang, Qidong Zhao, Jinku Cui, Xu Liu, Dongkuan Xu
To address these limitations, we introduce AutoST, a training-free NAS method for Spiking Transformers, to rapidly identify high-performance Spiking Transformer architectures.
2 code implementations • 23 May 2023 • Binfeng Xu, Zhiyuan Peng, Bowen Lei, Subhabrata Mukherjee, Yuchen Liu, Dongkuan Xu
Augmented Language Models (ALMs) blend the reasoning capabilities of Large Language Models (LLMs) with tools that allow for knowledge retrieval and action execution.
no code implementations • 24 Apr 2023 • Shaoyi Huang, Haowen Fang, Kaleel Mahmood, Bowen Lei, Nuo Xu, Bin Lei, Yue Sun, Dongkuan Xu, Wujie Wen, Caiwen Ding
Experimental results show that NDSNN achieves up to 20. 52\% improvement in accuracy on Tiny-ImageNet using ResNet-19 (with a sparsity of 99\%) as compared to other SOTA methods (e. g., Lottery Ticket Hypothesis (LTH), SET-SNN, RigL-SNN).
1 code implementation • 21 Mar 2023 • Dongsheng Luo, Wei Cheng, Yingheng Wang, Dongkuan Xu, Jingchao Ni, Wenchao Yu, Xuchao Zhang, Yanchi Liu, Yuncong Chen, Haifeng Chen, Xiang Zhang
A key component of contrastive learning is to select appropriate augmentations imposing some priors to construct feasible positive samples, such that an encoder can be trained to learn robust and discriminative representations.
no code implementations • 27 Feb 2023 • Yue Xiang, Dongyao Zhu, Bowen Lei, Dongkuan Xu, Ruqi Zhang
Gradients have been exploited in proposal distributions to accelerate the convergence of Markov chain Monte Carlo algorithms on discrete distributions.
1 code implementation • 18 Feb 2023 • Bowen Lei, Ruqi Zhang, Dongkuan Xu, Bani Mallick
Previous research has shown that deep neural networks tend to be over-confident, and we find that sparse training exacerbates this problem.
1 code implementation • 9 Jan 2023 • Bowen Lei, Dongkuan Xu, Ruqi Zhang, Shuren He, Bani K. Mallick
To accelerate and stabilize the convergence of sparse training, we analyze the gradient changes and develop an adaptive gradient correction method.
2 code implementations • CVPR 2023 • Lei Zhang, Jie Zhang, Bowen Lei, Subhabrata Mukherjee, Xiang Pan, Bo Zhao, Caiwen Ding, Yao Li, Dongkuan Xu
Dataset Distillation (DD), a newly emerging field, aims at generating much smaller but efficient synthetic training datasets from large ones.
no code implementations • 30 Nov 2022 • Shaoyi Huang, Bowen Lei, Dongkuan Xu, Hongwu Peng, Yue Sun, Mimi Xie, Caiwen Ding
We further design an acquisition function and provide the theoretical guarantees for the proposed method and clarify its convergence property.
1 code implementation • CVPR 2023 • Shengkun Tang, Yaqing Wang, Zhenglun Kong, Tianchi Zhang, Yao Li, Caiwen Ding, Yanzhi Wang, Yi Liang, Dongkuan Xu
To handle this challenge, we propose a novel early exiting strategy for unified visual language models, which allows dynamically skip the layers in encoder and decoder simultaneously in term of input layer-wise similarities with multiple times of early exiting, namely \textbf{MuE}.
no code implementations • 15 Nov 2022 • Qin Zhang, Shangsi Chen, Dongkuan Xu, Qingqing Cao, Xiaojun Chen, Trevor Cohn, Meng Fang
Thus, a trade-off between accuracy, memory consumption and processing speed is pursued.
no code implementations • 16 Jul 2022 • Ian En-Hsu Yen, Zhibin Xiao, Dongkuan Xu
And the degree of sparsity one can exploit has become higher as larger model sizes have been considered along with the trend of pre-training giant models.
no code implementations • 21 Jun 2022 • Shaoyi Huang, Ning Liu, Yueying Liang, Hongwu Peng, Hongjia Li, Dongkuan Xu, Mimi Xie, Caiwen Ding
On MRPC, we obtain a 4. 6 higher score than the SOTA at the same overall pruning ratio of 0. 5.
no code implementations • 29 Jan 2022 • Dongkuan Xu, Subhabrata Mukherjee, Xiaodong Liu, Debadeepta Dey, Wenhui Wang, Xiang Zhang, Ahmed Hassan Awadallah, Jianfeng Gao
Our framework AutoDistil addresses above challenges with the following steps: (a) Incorporates inductive bias and heuristics to partition Transformer search space into K compact sub-spaces (K=3 for typical student sizes of base, small and tiny); (b) Trains one SuperLM for each sub-space using task-agnostic objective (e. g., self-attention distillation) with weight-sharing of students; (c) Lightweight search for the optimal student without re-training.
no code implementations • NeurIPS 2021 • Dongkuan Xu, Wei Cheng, Dongsheng Luo, Haifeng Chen, Xiang Zhang
The key point of this framework is to follow the Information Bottleneck principle to reduce the mutual information between contrastive parts while keeping task-relevant information intact at both the levels of the individual module and the entire framework so that the information loss during graph representation learning can be minimized.
no code implementations • ACL 2022 • Shaoyi Huang, Dongkuan Xu, Ian E. H. Yen, Yijue Wang, Sung-En Chang, Bingbing Li, Shiyang Chen, Mimi Xie, Sanguthevar Rajasekaran, Hang Liu, Caiwen Ding
Conventional wisdom in pruning Transformer-based language models is that pruning reduces the model expressiveness and thus is more likely to underfit rather than overfit.
no code implementations • 29 Sep 2021 • Dongsheng Luo, Wei Cheng, Yingheng Wang, Dongkuan Xu, Jingchao Ni, Wenchao Yu, Xuchao Zhang, Yanchi Liu, Haifeng Chen, Xiang Zhang
How to find the desired augmentations of time series data that are meaningful for given contrastive learning tasks and datasets remains an open question.
no code implementations • ACL 2021 • Xin Dong, Yaxin Zhu, Zuohui Fu, Dongkuan Xu, Gerard de Melo
Due to recent pretrained multilingual representation models, it has become feasible to exploit labeled data from one language to train a cross-lingual model that can then be applied to multiple new languages.
Cross-Lingual Natural Language Inference
Data Augmentation
+1
1 code implementation • NAACL 2021 • Dongkuan Xu, Ian E. H. Yen, Jinxi Zhao, Zhibin Xiao
In particular, common wisdom in pruning CNN states that sparse pruning technique compresses a model more than that obtained by reducing number of channels and layers (Elsen et al., 2020; Zhu and Gupta, 2017), while existing works on sparse pruning of BERT yields inferior results than its small-dense counterparts such as TinyBERT (Jiao et al., 2020).
4 code implementations • NeurIPS 2020 • Dongsheng Luo, Wei Cheng, Dongkuan Xu, Wenchao Yu, Bo Zong, Haifeng Chen, Xiang Zhang
The unique explanation interpreting each instance independently is not sufficient to provide a global understanding of the learned GNN model, leading to a lack of generalizability and hindering it from being used in the inductive setting.
no code implementations • 29 Jul 2020 • Xin Dong, Yaxin Zhu, Yupeng Zhang, Zuohui Fu, Dongkuan Xu, Sen yang, Gerard de Melo
The resulting model then serves as a teacher to induce labels for unlabeled target language samples that can be used during further adversarial training, allowing us to gradually adapt our model to the target language.
no code implementations • 24 May 2020 • Junjie Liang, Yanting Wu, Dongkuan Xu, Vasant Honavar
Specifically, L-DKGPR eliminates the need for ad hoc heuristics or trial and error using a novel adaptation of deep kernel learning that combines the expressive power of deep neural networks with the flexibility of non-parametric kernel methods.
no code implementations • 1 Mar 2020 • Hua Wei, Dongkuan Xu, Junjie Liang, Zhenhui Li
To the best of our knowledge, we are the first to learn to model the state transition of moving agents with system dynamics.
1 code implementation • 11 Nov 2019 • Junjie Liang, Dongkuan Xu, Yiwei Sun, Vasant Honavar
However, the current state-of-the-art methods are unable to select the most predictive fixed effects and random effects from a large number of variables, while accounting for complex correlation structure in the data and non-linear interactions among the variables.
no code implementations • 12 Dec 2016 • Dongkuan Xu, Jia Wu, Wei zhang, Yingjie Tian
To the end, we propose a positive instance detection via graph updating for multiple instance learning, called PIGMIL, to detect TPI accurately.