no code implementations • WMT (EMNLP) 2020 • Longyue Wang, Zhaopeng Tu, Xing Wang, Li Ding, Liang Ding, Shuming Shi
This paper describes the Tencent AI Lab’s submission of the WMT 2020 shared task on chat translation in English-German.
no code implementations • ACL (IWSLT) 2021 • Liang Ding, DaCheng Tao
Our constrained system is based on a pipeline framework, i. e. ASR and NMT.
1 code implementation • ACL 2022 • Liang Ding, Longyue Wang, Shuming Shi, DaCheng Tao, Zhaopeng Tu
In this work, we provide an appealing alternative for NAT – monolingual KD, which trains NAT student on external monolingual data with AT teacher trained on the original bilingual data.
no code implementations • 16 Apr 2025 • Shizhan Cai, Liang Ding, DaCheng Tao
The rapid development of Large Language Models (LLMs) has intensified concerns about content traceability and potential misuse.
no code implementations • 12 Apr 2025 • Yikun Wang, Siyin Wang, Qinyuan Cheng, Zhaoye Fei, Liang Ding, Qipeng Guo, DaCheng Tao, Xipeng Qiu
Recent advancements in Large Vision-Language Models have showcased remarkable capabilities.
1 code implementation • 24 Mar 2025 • Zhexuan Wang, Yutong Wang, Xuebo Liu, Liang Ding, Miao Zhang, Jie Liu, Min Zhang
Multi-agent systems (MAS) based on large language models (LLMs) have demonstrated significant potential in collaborative problem-solving.
1 code implementation • 3 Mar 2025 • Wenbin Wang, Yongcheng Jing, Liang Ding, Yingjie Wang, Li Shen, Yong Luo, Bo Du, DaCheng Tao
High-resolution (HR) image perception remains a key challenge in multimodal large language models (MLLMs).
no code implementations • 20 Feb 2025 • Yuchen Wu, Liang Ding, Li Shen, DaCheng Tao
Knowledge editing allows for efficient adaptation of large language models (LLMs) to new information or corrections without requiring full retraining.
no code implementations • 19 Feb 2025 • Keqin Peng, Liang Ding, Yuanxin Ouyang, Meng Fang, Yancheng Yuan, DaCheng Tao
Large language models (LLMs) excel at a range of tasks through in-context learning (ICL), where only a few task examples guide their predictions.
1 code implementation • 6 Feb 2025 • Shaopeng Fu, Liang Ding, Di Wang
This paper focuses on adversarial suffix jailbreak attacks and unveils that to defend against a jailbreak attack with an adversarial suffix of length $\Theta(M)$, it is enough to align LLMs on prompts with adversarial suffixes of length $\Theta(\sqrt{M})$.
no code implementations • 31 Jan 2025 • Yan Sun, Tiansheng Huang, Liang Ding, Li Shen, DaCheng Tao
Zeroth-order optimization (ZO) has demonstrated remarkable promise in efficient fine-tuning tasks for Large Language Models (LLMs).
no code implementations • 31 Jan 2025 • Yuchun Miao, Sen Zhang, Liang Ding, Yuqi Zhang, Lefei Zhang, DaCheng Tao
This work identifies the Energy Loss Phenomenon in Reinforcement Learning from Human Feedback (RLHF) and its connection to reward hacking.
no code implementations • 14 Jan 2025 • Shuai Wang, Liang Ding, Yibing Zhan, Yong Luo, Zheng He, Dapeng Tao
Automated code generation using large language models (LLMs) has gained attention due to its efficiency and adaptability.
no code implementations • 19 Dec 2024 • Yuncheng Song, Liang Ding, Changtong Zan, ShuJian Huang
Knowledge distillation (KD) has shown great promise in transferring knowledge from larger teacher models to smaller student models.
no code implementations • 19 Dec 2024 • Xiabin Zhou, Wenbin Wang, Minyan Zeng, Jiaxian Guo, Xuebo Liu, Li Shen, Min Zhang, Liang Ding
Efficient KV cache management in LLMs is crucial for long-context tasks like RAG and summarization.
no code implementations • 23 Oct 2024 • Xintong Wang, Jingheng Pan, Longqin Jiang, Liang Ding, Xingshan Li, Chris Biemann
Despite their impressive capabilities, large language models (LLMs) often lack interpretability and can generate toxic content.
no code implementations • 15 Oct 2024 • Qihuang Zhong, Kunfeng Chen, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao
Large Language Models (LLMs) have shown promising performance in text-to-SQL, which involves translating natural language questions into SQL queries.
no code implementations • 13 Oct 2024 • Fei Wang, Li Shen, Liang Ding, Chao Xue, Ye Liu, Changxing Ding
By revisiting the Memory-efficient ZO (MeZO) optimizer, we discover that the full-parameter perturbation and updating processes consume over 50% of its overall fine-tuning time cost.
1 code implementation • 4 Oct 2024 • Tengfei Yu, Xuebo Liu, Zhiyi Hou, Liang Ding, DaCheng Tao, Min Zhang
This study aims to refine the use of speech datasets for LSM training by addressing the limitations of vanilla instruction tuning.
no code implementations • 23 Sep 2024 • Bin Hong, Jinze Wu, Jiayu Liu, Liang Ding, Jing Sha, Kai Zhang, Shijin Wang, Zhenya Huang
In recent years, the breakthrough of Large Language Models (LLMs) offers new ideas for achieving universal methods on graph data.
1 code implementation • 22 Sep 2024 • Qingyu Lu, Liang Ding, Kanjian Zhang, Jinxia Zhang, DaCheng Tao
To enhance the quality of error annotations predicted by LLM evaluators, we introduce a universal and training-free framework, $\textbf{MQM-APE}$, based on the idea of filtering out non-impactful errors by Automatically Post-Editing (APE) the original translation based on each error, leaving only those errors that contribute to quality improvement.
no code implementations • 19 Sep 2024 • Jun Rao, Xuebo Liu, Zepeng Lin, Liang Ding, Jing Li, DaCheng Tao, Min Zhang
Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them.
no code implementations • 9 Sep 2024 • Shuai Wang, Liang Ding, Li Shen, Yong Luo, Zheng He, Wei Yu, DaCheng Tao
Then, we selectively eliminate output noise induced by lame prompts based on the uncertainty of the prediction distribution from the standard prompt.
1 code implementation • 28 Aug 2024 • Wenbin Wang, Liang Ding, Minyan Zeng, Xiabin Zhou, Li Shen, Yong Luo, DaCheng Tao
Building upon this insight, we propose Divide, Conquer and Combine (DC$^2$), a novel training-free framework for enhancing MLLM perception of HR images.
1 code implementation • 17 Jun 2024 • Rong Bao, Rui Zheng, Shihan Dou, Xiao Wang, Enyu Zhou, Bo wang, Qi Zhang, Liang Ding, DaCheng Tao
In aligning large language models (LLMs), utilizing feedback from existing advanced AI rather than humans is an important method to scale supervisory signals.
1 code implementation • 7 Jun 2024 • Hongyu Li, Liang Ding, Meng Fang, DaCheng Tao
Catastrophic Forgetting (CF) means models forgetting previously acquired knowledge when learning new data.
no code implementations • 7 Jun 2024 • Yikun Wang, Rui Zheng, Liang Ding, Qi Zhang, Dahua Lin, DaCheng Tao
As instruction-tuned large language models (LLMs) evolve, aligning pretrained foundation models presents increasing challenges.
1 code implementation • 4 Jun 2024 • Shwai He, Daize Dong, Liang Ding, Ang Li
Extensive experimental results demonstrate the effectiveness of the compression methods under our framework and the proposed recipe, achieving a 6. 05x speedup and only 20. 0GB memory usage while maintaining over 92% of performance on Mixtral-8x7B.
no code implementations • 2 May 2024 • Tianle Xia, Liang Ding, Guojia Wan, Yibing Zhan, Bo Du, DaCheng Tao
Specifically, we augment the arbitrary first-order logical queries via binary tree decomposition, to stimulate the reasoning capability of LLMs.
1 code implementation • 29 Apr 2024 • Xinyu Ma, Xuebo Liu, Derek F. Wong, Jun Rao, Bei Li, Liang Ding, Lidia S. Chao, DaCheng Tao, Min Zhang
Experimental results show that MMT models trained on our dataset exhibit a greater ability to exploit visual information than those trained on other MMT datasets.
1 code implementation • 23 Apr 2024 • Qihuang Zhong, Kang Wang, Ziyang Xu, Juhua Liu, Liang Ding, Bo Du
To this end, we propose a simple-yet-effective method, namely Deeply Understanding the Problems (DUP), to improve the LLMs' math problem-solving ability by addressing semantic misunderstanding errors.
Ranked #1 on
Math Word Problem Solving
on SVAMP
(Accuracy metric)
2 code implementations • 27 Mar 2024 • Xintong Wang, Jingheng Pan, Liang Ding, Chris Biemann
Our method is inspired by our observation that what we call disturbance instructions significantly exacerbate hallucinations in multimodal fusion modules.
1 code implementation • 21 Mar 2024 • Changtong Zan, Liang Ding, Li Shen, Yibing Zhen, Weifeng Liu, DaCheng Tao
In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs.
no code implementations • 20 Mar 2024 • Lu Zou, Liang Ding
By utilizing a technique called Kernel Packets (KP), we prove that the convergence rate of Back-fitting is no faster than $(1-\mathcal{O}(\frac{1}{n}))^t$, where $n$ and $t$ denote the data size and the iteration number, respectively.
1 code implementation • 15 Mar 2024 • Ziyang Xu, Keqin Peng, Liang Ding, DaCheng Tao, Xiliang Lu
Experiments across various prompts, PLMs, and benchmarks show that our approach can not only correct the overfitted performance caused by prompt bias, but also significantly improve the prompt retrieval capability (up to 10% absolute performance gain).
no code implementations • 5 Mar 2024 • Zhonghai Wang, Jie Jiang, Yibing Zhan, Bohao Zhou, Yanhong Li, Chong Zhang, Liang Ding, Hua Jin, Jun Peng, Xu Lin, Weifeng Liu
3) We introduce a standardized benchmark for evaluating medical LLM in Anesthesiology.
no code implementations • 20 Feb 2024 • Zhiyao Ren, Yibing Zhan, Baosheng Yu, Liang Ding, DaCheng Tao
The copilot framework, which aims to enhance and tailor large language models (LLMs) for specific complex tasks without requiring fine-tuning, is gaining increasing attention from the community.
no code implementations • 19 Feb 2024 • Hong Chen, Chengtao Lv, Liang Ding, Haotong Qin, Xiabin Zhou, Yifu Ding, Xuebo Liu, Min Zhang, Jinyang Guo, Xianglong Liu, DaCheng Tao
Large language models (LLMs) have significantly advanced the field of natural language processing, while the expensive memory and computation consumption impede their practical deployment.
1 code implementation • 19 Feb 2024 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao
With the development of instruction-tuned large language models (LLMs), improving the safety of LLMs has become more critical.
1 code implementation • 19 Feb 2024 • Qihuang Zhong, Liang Ding, Li Shen, Juhua Liu, Bo Du, DaCheng Tao
Knowledge distillation (KD) is a common approach to compress a teacher model to reduce its inference cost and memory footprint, by training a smaller student model.
1 code implementation • 14 Feb 2024 • Yuchun Miao, Sen Zhang, Liang Ding, Rong Bao, Lefei Zhang, DaCheng Tao
Inspired by this finding, we propose the Cluster Separation Index (CSI), which quantifies deviations in the IB latent space, as an indicator of reward overoptimization to facilitate the development of online mitigation strategies.
no code implementations • 6 Feb 2024 • Liang Ding, Rui Tuo
It is well known that the state space (SS) model formulation of a Gaussian process (GP) can lower its training and prediction time both to $\CalO(n)$ for $n$ data points.
1 code implementation • 22 Jan 2024 • Keqin Peng, Liang Ding, Yancheng Yuan, Xuebo Liu, Min Zhang, Yuanxin Ouyang, DaCheng Tao
In this work, we first revisit the factors contributing to this variance from both data and model aspects, and find that the choice of demonstration is both data- and model-dependent.
no code implementations • 12 Jan 2024 • Wenbin Wang, Liang Ding, Li Shen, Yong Luo, Han Hu, DaCheng Tao
Sentiment analysis is rapidly advancing by utilizing various data modalities (e. g., text, image).
1 code implementation • 12 Jan 2024 • Yuqi Zhang, Liang Ding, Lefei Zhang, DaCheng Tao
Extensive experiments on varying jailbreak benchmarks across a wide range of LLMs show that $\mathbb{IA}$ could consistently and significantly reduce the harmfulness in responses (averagely -48. 2% attack success rate).
1 code implementation • 12 Jan 2024 • Shuai Wang, Liang Ding, Li Shen, Yong Luo, Bo Du, DaCheng Tao
Advancing automated programming necessitates robust and comprehensive code generation benchmarks, yet current evaluation frameworks largely neglect object-oriented programming (OOP) in favor of functional programming (FP), e. g., HumanEval and MBPP.
no code implementations • 11 Jan 2024 • Shilong Pan, Zhiliang Tian, Liang Ding, Zhen Huang, Zhihua Wen, Dongsheng Li
POMP involves constructing a directed acyclic meta-graph for each source language, from which we dynamically sample multiple paths to prompt LLMs to mitigate the linguistic noise and improve translations during training.
no code implementations • CVPR 2024 • Zhiyuan Yu, Li Shen, Liang Ding, Xinmei Tian, Yixin Chen, DaCheng Tao
To address these challenges we introduce PreBackRazor a novel activation pruning scheme offering both computational and memory efficiency through a sparsified backpropagation strategy which judiciously avoids unnecessary activation pruning and storage and gradient computation.
1 code implementation • 11 Dec 2023 • Anke Tang, Li Shen, Yong Luo, Liang Ding, Han Hu, Bo Du, DaCheng Tao
At the upper level, we focus on learning a shared Concrete mask to identify the subspace, while at the inner level, model merging is performed to maximize the performance of the merged model.
no code implementations • 9 Dec 2023 • Chuang Liu, Yibing Zhan, Xueqi Ma, Liang Ding, Dapeng Tao, Jia Wu, Wenbin Hu, Bo Du
Graph Transformers (GTs) have achieved impressive results on various graph-related tasks.
1 code implementation • 26 Nov 2023 • Lei Wang, Yibing Zhan, Leilei Ma, Dapeng Tao, Liang Ding, Chen Gong
The "splice" in our method is two-fold: 1) Each mixed image is a splice of several downsampled images in the form of a grid, where the semantics of images attending to mixing are blended without object deficiencies for alleviating co-occurred bias; 2) We splice mixed images and the original mini-batch to form a new SpliceMixed mini-batch, which allows an image with different scales to contribute to training together.
no code implementations • 20 Oct 2023 • Miaoxi Zhu, Qihuang Zhong, Li Shen, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao
The key algorithm in solving ZSAQ is the SAM-SGA optimization, which aims to improve the quantization accuracy and model generalization via optimizing a minimax problem.
no code implementations • 15 Oct 2023 • Boan Liu, Liang Ding, Li Shen, Keqin Peng, Yu Cao, Dazhao Cheng, DaCheng Tao
The Mixture of Experts (MoE) has emerged as a highly successful technique in deep learning, based on the principle of divide-and-conquer to maximize model capacity without significant additional computational cost.
1 code implementation • 15 Oct 2023 • Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, DaCheng Tao
Although a sparse Mixture of Experts (MoE) can reduce the cost by activating a small subset of parameters (e. g., one expert) for each input, its computation escalates significantly if increasing the number of activated experts, limiting its practical utility.
1 code implementation • 28 Sep 2023 • Changtong Zan, Liang Ding, Li Shen, Yibin Lei, Yibing Zhan, Weifeng Liu, DaCheng Tao
Zero-shot translation (ZST), which is generally based on a multilingual neural machine translation model, aims to translate between unseen language pairs in training data.
no code implementations • 27 Sep 2023 • Weishi Li, Yong Peng, Miao Zhang, Liang Ding, Han Hu, Li Shen
Specifically, we categorize existing deep model fusion methods as four-fold: (1) "Mode connectivity", which connects the solutions in weight space via a path of non-increasing loss, in order to obtain better initialization for model fusion; (2) "Alignment" matches units between neural networks to create better conditions for fusion; (3) "Weight average", a classical model fusion method, averages the weights of multiple models to obtain more accurate results closer to the optimal solution; (4) "Ensemble learning" combines the outputs of diverse models, which is a foundational technique for improving the accuracy and robustness of the final model.
no code implementations • 30 Aug 2023 • Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, DaCheng Tao
Adapter tuning, which updates only a few parameters, has become a mainstream method for fine-tuning pretrained language models to downstream tasks.
no code implementations • 29 Aug 2023 • Qingyue Wang, Liang Ding, Yanan Cao, Zhiliang Tian, Shi Wang, DaCheng Tao, Li Guo
We evaluate our method on both open and closed LLMs, and the experiments on the widely-used public dataset show that our method can generate more consistent responses in a long-context conversation.
1 code implementation • 24 Aug 2023 • Fei Wang, Liang Ding, Jun Rao, Ye Liu, Li Shen, Changxing Ding
The multimedia community has shown a significant interest in perceiving and representing the physical world with multimodal pretrained neural network models, and among them, the visual-language pertaining (VLP) is, currently, the most captivating topic.
no code implementations • 30 Jul 2023 • Yan Sun, Li Shen, Hao Sun, Liang Ding, DaCheng Tao
Adaptive optimization has achieved notable success for distributed learning while extending adaptive optimizer to federated Learning (FL) suffers from severe inefficiency, including (i) rugged convergence due to inaccurate gradient estimation in global adaptive optimizer; (ii) client drifts exacerbated by local over-fitting with the local adaptive optimizer.
no code implementations • 13 Jul 2023 • Haoran Wang, Qinghua Cheng, Baosheng Yu, Yibing Zhan, Dapeng Tao, Liang Ding, Haibin Ling
We evaluated our method on three popular egocentric action recognition datasets, Something-Something V2, H2O, and EPIC-KITCHENS-100, and the experimental results demonstrate the effectiveness of the proposed method for handling data scarcity problems, including long-tailed and few-shot egocentric action recognition.
1 code implementation • 5 Jun 2023 • Yibin Lei, Liang Ding, Yu Cao, Changtong Zan, Andrew Yates, DaCheng Tao
Dense retrievers have achieved impressive performance, but their demand for abundant training data limits their application scenarios.
no code implementations • 1 Jun 2023 • Qingyue Wang, Liang Ding, Yanan Cao, Yibing Zhan, Zheng Lin, Shi Wang, DaCheng Tao, Li Guo
Zero-shot transfer learning for Dialogue State Tracking (DST) helps to handle a variety of task-oriented dialogue domains without the cost of collecting in-domain data.
1 code implementation • 24 May 2023 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao
Masked language modeling, widely used in discriminative language model (e. g., BERT) pretraining, commonly adopts a random masking strategy.
1 code implementation • 24 May 2023 • Qihuang Zhong, Liang Ding, Juhua Liu, Xuebo Liu, Min Zhang, Bo Du, DaCheng Tao
Token dropping is a recently-proposed strategy to speed up the pretraining of masked language models, such as BERT, by skipping the computation of a subset of the input tokens at several middle layers.
no code implementations • 22 May 2023 • Haoqi Zheng, Qihuang Zhong, Liang Ding, Zhiliang Tian, Xin Niu, Dongsheng Li, DaCheng Tao
However, most of the mixup methods do not consider the varying degree of learning difficulty in different stages of training and generate new samples with one hot labels, resulting in the model over confidence.
1 code implementation • 19 May 2023 • Yan Sun, Li Shen, Shixiang Chen, Liang Ding, DaCheng Tao
In federated learning (FL), a cluster of local clients are chaired under the coordination of the global server and cooperatively train one model with privacy protection.
no code implementations • 5 May 2023 • Liang Ding, Tianyang Hu, Jiahang Jiang, Donghao Li, Wenjia Wang, Yuan YAO
In this paper, we aim to bridge this gap by presenting a framework for random smoothing regularization that can adaptively and effectively learn a wide range of ground truth functions belonging to the classical Sobolev spaces.
no code implementations • 29 Apr 2023 • Lu Zou, HaoYuan Chen, Liang Ding
We show how to use these sparse formulas to generalize back-fitting-based algorithms to efficiently compute the posterior mean, posterior variance, log-likelihood, and gradient of these three functions for additive GPs, all in $O(n \log n)$ time.
1 code implementation • 20 Apr 2023 • Chiaming Hsu, Changtong Zan, Liang Ding, Longyue Wang, Xiaoting Wang, Weifeng Liu, Fu Lin, Wenbin Hu
Experiments on WMT17-EnZh XRE also show the effectiveness of our Prompt-XRE against other competitive baselines.
no code implementations • 7 Apr 2023 • Li Shen, Yan Sun, Zhiyuan Yu, Liang Ding, Xinmei Tian, DaCheng Tao
The field of deep learning has witnessed significant progress, particularly in computer vision (CV), natural language processing (NLP), and speech.
1 code implementation • 24 Mar 2023 • Qingyu Lu, Baopu Qiu, Liang Ding, Kanjian Zhang, Tom Kocmi, DaCheng Tao
To further improve the performance of LLMs on MT quality assessment, we investigate several prompting designs, and propose a new prompting method called \textbf{\texttt{Error Analysis Prompting}} (EAPrompt) by combining Chain-of-Thoughts (Wei et al., 2022) and Error Analysis (Lu et al., 2023).
1 code implementation • 24 Mar 2023 • Keqin Peng, Liang Ding, Qihuang Zhong, Li Shen, Xuebo Liu, Min Zhang, Yuanxin Ouyang, DaCheng Tao
We show that: 1) The performance of ChatGPT depends largely on temperature, and a lower temperature usually can achieve better performance; 2) Emphasizing the task information can further improve ChatGPT's performance, particularly in complex MT tasks; 3) Introducing domain information can elicit ChatGPT's generalization ability and improve its performance in the specific domain; 4) ChatGPT tends to generate hallucinations for non-English-centric MT tasks, which can be partially addressed by our proposed prompts but still need to be highlighted for the MT/NLP community.
no code implementations • 1 Mar 2023 • Hao Sun, Li Shen, Qihuang Zhong, Liang Ding, Shixiang Chen, Jingwei Sun, Jing Li, Guangzhong Sun, DaCheng Tao
Integrating SAM with adaptive learning rate and momentum acceleration, dubbed AdaSAM, has already been explored empirically to train large-scale deep neural networks without theoretical guarantee due to the triple difficulties in analyzing the coupled perturbation step, adaptive learning rate and momentum step.
no code implementations • 1 Mar 2023 • Chao Xue, Wei Liu, Shuai Xie, Zhenfang Wang, Jiaxing Li, Xuyang Peng, Liang Ding, Shanshan Zhao, Qiong Cao, Yibo Yang, Fengxiang He, Bohua Cai, Rongcheng Bian, Yiyan Zhao, Heliang Zheng, Xiangyang Liu, Dongkai Liu, Daqing Liu, Li Shen, Chang Li, Shijin Zhang, Yukang Zhang, Guanpu Chen, Shixiang Chen, Yibing Zhan, Jing Zhang, Chaoyue Wang, DaCheng Tao
Automated machine learning (AutoML) seeks to build ML models with minimal human effort.
2 code implementations • 21 Feb 2023 • Yan Sun, Li Shen, Tiansheng Huang, Liang Ding, DaCheng Tao
Federated learning is an emerging distributed machine learning framework which jointly trains a global model via a large number of local devices with data privacy protections.
1 code implementation • 19 Feb 2023 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao
Recently, ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries.
no code implementations • 18 Feb 2023 • Qihuang Zhong, Liang Ding, Keqin Peng, Juhua Liu, Bo Du, Li Shen, Yibing Zhan, DaCheng Tao
This technical report briefly describes our JDExplore d-team's submission Vega v1 on the General Language Understanding Evaluation (GLUE) leaderboard, where GLUE is a collection of nine natural language understanding tasks, including question answering, linguistic acceptability, sentiment analysis, text similarity, paraphrase detection, and natural language inference.
no code implementations • 20 Dec 2022 • Baopu Qiu, Liang Ding, Di wu, Lin Shang, Yibing Zhan, DaCheng Tao
Machine Translation Quality Estimation (QE) is the task of evaluating translation output in the absence of human-written references.
1 code implementation • 20 Dec 2022 • Qingyu Lu, Liang Ding, Liping Xie, Kanjian Zhang, Derek F. Wong, DaCheng Tao
To this end, we augment BARTScore by incorporating the human-like error analysis strategies, namely BARTScore++, where the final score consists of both the evaluations of major errors and minor errors.
no code implementations • 4 Dec 2022 • Qihuang Zhong, Liang Ding, Yibing Zhan, Yu Qiao, Yonggang Wen, Li Shen, Juhua Liu, Baosheng Yu, Bo Du, Yixin Chen, Xinbo Gao, Chunyan Miao, Xiaoou Tang, DaCheng Tao
This technical report briefly describes our JDExplore d-team's Vega v2 submission on the SuperGLUE leaderboard.
Ranked #1 on
Common Sense Reasoning
on ReCoRD
1 code implementation • 2 Dec 2022 • Hexuan Deng, Liang Ding, Xuebo Liu, Meishan Zhang, DaCheng Tao, Min Zhang
Preliminary experiments on En-Zh and En-Ja news domain corpora demonstrate that monolingual data can significantly improve translation quality (e. g., +3. 15 BLEU on En-Zh).
1 code implementation • 10 Nov 2022 • Shwai He, Liang Ding, Daize Dong, Boan Liu, Fuqiang Yu, DaCheng Tao
The main contributions of our work are challenging the basic commonsense in dynamic networks and proposing a partially dynamic network, namely PAD-Net, to transform the redundant dynamic parameters into static ones.
1 code implementation • 11 Oct 2022 • Qihuang Zhong, Liang Ding, Li Shen, Peng Mi, Juhua Liu, Bo Du, DaCheng Tao
Fine-tuning large pretrained language models on a limited training corpus usually suffers from poor generalization.
1 code implementation • 9 Oct 2022 • Shwai He, Liang Ding, Daize Dong, Miao Zhang, DaCheng Tao
Adapter Tuning, which freezes the pretrained language models (PLMs) and only fine-tunes a few extra modules, becomes an appealing efficient alternative to the full model fine-tuning.
1 code implementation • 20 Sep 2022 • Changtong Zan, Keqin Peng, Liang Ding, Baopu Qiu, Boan Liu, Shwai He, Qingyu Lu, Zheng Zhang, Chuang Liu, Weifeng Liu, Yibing Zhan, DaCheng Tao
As for model sizes, we scale the Transformer-Big up to the extremely large model that owns nearly 4. 7 Billion parameters, to fully enhance the model capacity for our Vega-MT.
Ranked #1 on
Machine Translation
on WMT 2022 English-Russian
1 code implementation • COLING 2022 • Changtong Zan, Liang Ding, Li Shen, Yu Cao, Weifeng Liu, DaCheng Tao
Pre-Training (PT) of text representations has been successfully applied to low-resource Neural Machine Translation (NMT).
1 code implementation • 22 Aug 2022 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao
Prompt Transfer (PoT) is a recently-proposed approach to improve prompt-tuning, by initializing the target prompt with the existing prompt trained on similar source tasks.
no code implementations • 18 Jul 2022 • Chuang Liu, Xueqi Ma, Yibing Zhan, Liang Ding, Dapeng Tao, Bo Du, Wenbin Hu, Danilo Mandic
However, the LTH-based methods suffer from two major drawbacks: 1) they require exhaustive and iterative training of dense models, resulting in an extremely large training computation cost, and 2) they only trim graph structures and model parameters but ignore the node feature dimension, where significant redundancy exists.
no code implementations • 4 Jul 2022 • Jun Rao, Liang Ding, Shuhan Qi, Meng Fang, Yang Liu, Li Shen, DaCheng Tao
Although the vision-and-language pretraining (VLP) equipped cross-modal image-text retrieval (ITR) has achieved remarkable progress in the past two years, it suffers from a major drawback: the ever-increasing size of VLP models restricts its deployment to real-world search scenarios (where the high latency is unacceptable).
1 code implementation • 30 May 2022 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao
To verify our hypothesis, we first empirically study the functionalities of the encoder and decoder in seq2seq pretrained language models, and find that the encoder takes an important but under-exploitation role than the decoder regarding the downstream performance and neuron activation.
no code implementations • 28 May 2022 • Jun Rao, Xv Meng, Liang Ding, Shuhan Qi, DaCheng Tao
In this paper, we present a parameter-efficient and student-friendly knowledge distillation method, namely PESF-KD, to achieve efficient and sufficient knowledge transfer by updating relatively few partial parameters.
1 code implementation • NAACL 2022 • Hanhao Qu, Yu Cao, Jun Gao, Liang Ding, Ruifeng Xu
We present IBR, an Iterative Backward Reasoning model to solve the proof generation tasks on rule-based Question Answering (QA), where models are required to reason over a series of textual rules and facts to find out the related proof path and derive the final answer.
no code implementations • 16 Apr 2022 • Zheng Zhang, Liang Ding, Dazhao Cheng, Xuebo Liu, Min Zhang, DaCheng Tao
Data augmentations (DA) are the cores to achieving robust sequence-to-sequence learning on various natural language processing (NLP) tasks.
1 code implementation • COLING 2022 • Bing Wang, Liang Ding, Qihuang Zhong, Ximing Li, DaCheng Tao
Aspect-based sentiment analysis (ABSA) is a fine-grained sentiment analysis task, which focuses on detecting the sentiment polarity towards the aspect in a sentence.
Aspect-Based Sentiment Analysis
Aspect-Based Sentiment Analysis (ABSA)
+4
1 code implementation • 16 Apr 2022 • Changtong Zan, Liang Ding, Li Shen, Yu Cao, Weifeng Liu, DaCheng Tao
For multilingual sequence-to-sequence pretrained language models (multilingual Seq2Seq PLMs), e. g. mBART, the self-supervised pretraining task is trained on a wide range of monolingual languages, e. g. 25 languages from CommonCrawl, while the downstream cross-lingual tasks generally progress on a bilingual language subset, e. g. English-German, making there exists the data discrepancy, namely domain discrepancy, and cross-lingual learning objective discrepancy, namely task discrepancy, between the pretraining and finetuning stages.
no code implementations • 5 Apr 2022 • Shwai He, Chenbo Jiang, Daize Dong, Liang Ding
Dynamic convolution achieves better performance for efficient CNNs at the cost of negligible FLOPs increase.
1 code implementation • CVPR 2022 • Lin Zhang, Li Shen, Liang Ding, DaCheng Tao, Ling-Yu Duan
Instead, we propose a data-free knowledge distillation method to fine-tune the global model in the server (FedFTG), which relieves the issue of direct model aggregation.
1 code implementation • 8 Mar 2022 • Jun Rao, Fei Wang, Liang Ding, Shuhan Qi, Yibing Zhan, Weifeng Liu, DaCheng Tao
In contrast to previous works, we focus on the reproducibility of the approaches and the examination of the elements that lead to improved performance by pretrained and nonpretrained models in retrieving images and text.
no code implementations • 7 Mar 2022 • HaoYuan Chen, Liang Ding, Rui Tuo
We develop an exact and scalable algorithm for one-dimensional Gaussian process regression with Mat\'ern correlations whose smoothness parameter $\nu$ is a half-integer.
no code implementations • 19 Jan 2022 • Liang Ding, Keqin Peng, DaCheng Tao
We present a simple and effective pretraining strategy {D}en{o}ising {T}raining DoT for neural machine translation.
1 code implementation • 13 Jan 2022 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Hua Jin, DaCheng Tao
To this end, we propose a knowledge graph augmented network KGAN, which aims to effectively incorporate external knowledge with explicitly syntactic and contextual information.
Aspect-Based Sentiment Analysis
Aspect-Based Sentiment Analysis (ABSA)
+2
no code implementations • 11 Dec 2021 • Liang Ding, Rui Tuo, Shahin Shahrampour
In this work, we use Deep Gaussian Processes (DGPs) as statistical surrogates for stochastic processes with complex distributions.
1 code implementation • 26 Oct 2021 • Juhua Liu, Qihuang Zhong, Liang Ding, Hua Jin, Bo Du, DaCheng Tao
In practice, we formulate the model pretrained on the sampled instances into a knowledge guidance model and a learner model, respectively.
Aspect-Based Sentiment Analysis
Aspect-Based Sentiment Analysis (ABSA)
+2
1 code implementation • Findings (EMNLP) 2021 • Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Shuming Shi, Zhaopeng Tu
Pre-training (PT) and back-translation (BT) are two simple and powerful methods to utilize monolingual data for improving the model performance of neural machine translation (NMT).
no code implementations • 29 Sep 2021 • Lin Zhang, Li Shen, Liang Ding, DaCheng Tao, Lingyu Duan
On the contrary, we propose a new solution: on-the-fly fine-tuning the global model in server via data-free distillation to boost its performance, dubbed FLBoost to relieve the issue of direct model aggregation.
no code implementations • EMNLP 2021 • Liang Ding, Di wu, DaCheng Tao
We present a simple and effective pretraining strategy -- bidirectional training (BiT) for neural machine translation.
no code implementations • 24 Jul 2021 • Liang Ding, Di wu, DaCheng Tao
Our constrained system is based on a pipeline framework, i. e. ASR and NMT.
no code implementations • 19 Jul 2021 • Liang Ding, Rui Tuo, Xiaowei Zhang
High-dimensional simulation optimization is notoriously challenging.
1 code implementation • Findings (ACL) 2021 • Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Shuming Shi, Zhaopeng Tu
In response to this problem, we propose a simple and effective method named copying penalty to control the copying behaviors in decoding.
no code implementations • Findings (ACL) 2021 • Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, DaCheng Tao, Zhaopeng Tu
Non-autoregressive translation (NAT) significantly accelerates the inference process via predicting the entire target sequence.
1 code implementation • ACL 2021 • Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, DaCheng Tao, Zhaopeng Tu
Results demonstrate that the proposed approach can significantly and universally improve translation quality by reducing translation errors on low-frequency words.
no code implementations • ACL (IWSLT) 2021 • Lei Zhou, Liang Ding, Kevin Duh, Shinji Watanabe, Ryohei Sasano, Koichi Takeda
In the field of machine learning, the well-trained model is assumed to be able to recover the training labels, i. e. the synthetic labels predicted by the model should be as close to the ground-truth labels as possible.
no code implementations • 13 Apr 2021 • Di wu, Yiren Chen, Liang Ding, DaCheng Tao
Spoken language understanding (SLU) system usually consists of various pipeline components, where each component heavily relies on the results of its upstream ones.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+7
1 code implementation • 2 Mar 2021 • Yu Cao, Liang Ding, Zhiliang Tian, Meng Fang
Dialogue generation models face the challenge of producing generic and repetitive responses.
no code implementations • IWSLT (ACL) 2022 • Di wu, Liang Ding, Shuo Yang, Mingyang Li
Recently, the performance of the neural word alignment models has exceeded that of statistical models.
no code implementations • 1 Jan 2021 • Di wu, Liang Ding, Shuo Yang, DaCheng Tao
Recently, the performance of the neural word alignment models has exceeded that of statistical models.
no code implementations • ICLR 2021 • Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, DaCheng Tao, Zhaopeng Tu
To this end, we introduce an extra Kullback-Leibler divergence term derived by comparing the lexical choice of NAT model and that embedded in the raw data.
1 code implementation • ICLR 2021 • Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Zhaopeng Tu
Encoder layer fusion (EncoderFusion) is a technique to fuse all the encoder layers (instead of the uppermost layer) for sequence-to-sequence (Seq2Seq) models, which has proven effective on various NLP tasks.
1 code implementation • COLING 2020 • Liang Ding, Longyue Wang, Di wu, DaCheng Tao, Zhaopeng Tu
Non-autoregressive translation (NAT) significantly accelerates the inference process by predicting the entire target sequence.
no code implementations • 14 Oct 2020 • Liang Ding, Xiaowei Zhang
However, its use is limited to cases where the design space is low-dimensional because, in general, the sample complexity (i. e., the number of design points required for stochastic kriging to produce an accurate prediction) grows exponentially in the dimensionality of the design space.
no code implementations • WMT (EMNLP) 2020 • Lei Zhou, Liang Ding, Koichi Takeda
In response to this issue, we propose to expose explicit cross-lingual patterns, \textit{e. g.} word alignments and generation score, to our proposed zero-shot models.
1 code implementation • EMNLP 2020 • Di wu, Liang Ding, Fan Lu, Jian Xie
Slot filling and intent detection are two main tasks in spoken language understanding (SLU) system.
no code implementations • 5 Jun 2020 • Liang Ding, Lu Zou, Wenjia Wang, Shahin Shahrampour, Rui Tuo
Density estimation plays a key role in many tasks in machine learning, statistical inference, and visualization.
no code implementations • ACL 2020 • Liang Ding, Long-Yue Wang, DaCheng Tao
Position encoding (PE), an essential part of self-attention networks (SANs), is used to preserve the word order information for natural language processing tasks, generating fixed position indices for input sequences.
no code implementations • ICML 2020 • Liang Ding, Rui Tuo, Shahin Shahrampour
Despite their success, kernel methods suffer from a massive computational cost in practice.
no code implementations • 19 Aug 2019 • Liang Ding, DaCheng Tao
Syntax-incorporated machine translation models have been proven successful in improving the model's reasoning and meaning preservation ability.
no code implementations • WS 2019 • Liang Ding, DaCheng Tao
This paper describes the University of Sydney's submission of the WMT 2019 shared news translation task.
Ranked #1 on
Machine Translation
on WMT 2018 Finnish-English
1 code implementation • 21 Jan 2018 • Liang Ding, Di Chang, Russell Malmberg, Aaron Martinez, David Robinson, Matthew Wicker, Hongfei Yan, Liming Cai
The seminal work of Chow and Liu (1968) shows that approximation of a finite probabilistic system by Markov trees can achieve the minimum information loss with the topology of a maximum spanning tree.