no code implementations • WMT (EMNLP) 2020 • Longyue Wang, Zhaopeng Tu, Xing Wang, Li Ding, Liang Ding, Shuming Shi
This paper describes the Tencent AI Lab’s submission of the WMT 2020 shared task on chat translation in English-German.
1 code implementation • ACL 2022 • Liang Ding, Longyue Wang, Shuming Shi, DaCheng Tao, Zhaopeng Tu
In this work, we provide an appealing alternative for NAT – monolingual KD, which trains NAT student on external monolingual data with AT teacher trained on the original bilingual data.
no code implementations • ACL (IWSLT) 2021 • Liang Ding, DaCheng Tao
Our constrained system is based on a pipeline framework, i. e. ASR and NMT.
no code implementations • 2 May 2024 • Tianle Xia, Liang Ding, Guojia Wan, Yibing Zhan, Bo Du, DaCheng Tao
Specifically, we augment the arbitrary first-order logical queries via binary tree decomposition, to stimulate the reasoning capability of LLMs.
1 code implementation • 29 Apr 2024 • Xinyu Ma, Xuebo Liu, Derek F. Wong, Jun Rao, Bei Li, Liang Ding, Lidia S. Chao, DaCheng Tao, Min Zhang
Experimental results show that MMT models trained on our dataset exhibit a greater ability to exploit visual information than those trained on other MMT datasets.
no code implementations • 23 Apr 2024 • Qihuang Zhong, Kang Wang, Ziyang Xu, Juhua Liu, Liang Ding, Bo Du, DaCheng Tao
Chain of Thought prompting strategy has enhanced the performance of Large Language Models (LLMs) across various NLP tasks.
Ranked #1 on Math Word Problem Solving on SVAMP (Accuracy metric)
no code implementations • 27 Mar 2024 • Xintong Wang, Jingheng Pan, Liang Ding, Chris Biemann
Our method is inspired by our observation that what we call disturbance instructions significantly exacerbate hallucinations in multimodal fusion modules.
no code implementations • 21 Mar 2024 • Changtong Zan, Liang Ding, Li Shen, Yibing Zhen, Weifeng Liu, DaCheng Tao
In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs.
no code implementations • 20 Mar 2024 • Lu Zou, Liang Ding
By utilizing a technique called Kernel Packets (KP), we prove that the convergence rate of Back-fitting is no faster than $(1-\mathcal{O}(\frac{1}{n}))^t$, where $n$ and $t$ denote the data size and the iteration number, respectively.
1 code implementation • 15 Mar 2024 • Ziyang Xu, Keqin Peng, Liang Ding, DaCheng Tao, Xiliang Lu
Experiments across various prompts, PLMs, and benchmarks show that our approach can not only correct the overfitted performance caused by prompt bias, but also significantly improve the prompt retrieval capability (up to 10% absolute performance gain).
no code implementations • 5 Mar 2024 • Zhonghai Wang, Jie Jiang, Yibing Zhan, Bohao Zhou, Yanhong Li, Chong Zhang, Liang Ding, Hua Jin, Jun Peng, Xu Lin, Weifeng Liu
3) We introduce a standardized benchmark for evaluating medical LLM in Anesthesiology.
no code implementations • 20 Feb 2024 • Zhiyao Ren, Yibing Zhan, Baosheng Yu, Liang Ding, DaCheng Tao
The copilot framework, which aims to enhance and tailor large language models (LLMs) for specific complex tasks without requiring fine-tuning, is gaining increasing attention from the community.
no code implementations • 19 Feb 2024 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao
With the development of instruction-tuned large language models (LLMs), improving the safety of LLMs has become more critical.
no code implementations • 19 Feb 2024 • Qihuang Zhong, Liang Ding, Li Shen, Juhua Liu, Bo Du, DaCheng Tao
Knowledge distillation (KD) is a common approach to compress a teacher model to reduce its inference cost and memory footprint, by training a smaller student model.
no code implementations • 19 Feb 2024 • Hong Chen, Chengtao Lv, Liang Ding, Haotong Qin, Xiabin Zhou, Yifu Ding, Xuebo Liu, Min Zhang, Jinyang Guo, Xianglong Liu, DaCheng Tao
Large language models (LLMs) have significantly advanced the field of natural language processing, while the expensive memory and computation consumption impede their practical deployment.
no code implementations • 14 Feb 2024 • Yuchun Miao, Sen Zhang, Liang Ding, Rong Bao, Lefei Zhang, DaCheng Tao
Inspired by this finding, we propose the Integrated Cluster Deviation Score (ICDS), which quantifies deviations in the latent space, as an indicator of reward overoptimization to facilitate the development of online mitigation strategies.
no code implementations • 6 Feb 2024 • Liang Ding, Rui Tuo
It is well known that the state space (SS) model formulation of a Gaussian process (GP) can lower its training and prediction time both to $\CalO(n)$ for $n$ data points.
no code implementations • 22 Jan 2024 • Keqin Peng, Liang Ding, Yancheng Yuan, Xuebo Liu, Min Zhang, Yuanxin Ouyang, DaCheng Tao
In this work, we first revisit the factors contributing to this variance from both data and model aspects, and find that the choice of demonstration is both data- and model-dependent.
1 code implementation • 12 Jan 2024 • Shuai Wang, Liang Ding, Li Shen, Yong Luo, Bo Du, DaCheng Tao
Advancing automated programming necessitates robust and comprehensive code generation benchmarks, yet current evaluation frameworks largely neglect object-oriented programming (OOP) in favor of functional programming (FP), e. g., HumanEval and MBPP.
1 code implementation • 12 Jan 2024 • Yuqi Zhang, Liang Ding, Lefei Zhang, DaCheng Tao
Aligning large language models (LLMs) with human values, particularly in the face of complex and stealthy jailbreak attacks, presents a formidable challenge.
no code implementations • 12 Jan 2024 • Wenbin Wang, Liang Ding, Li Shen, Yong Luo, Han Hu, DaCheng Tao
Sentiment analysis is rapidly advancing by utilizing various data modalities (e. g., text, image).
no code implementations • 11 Jan 2024 • Shilong Pan, Zhiliang Tian, Liang Ding, Zhen Huang, Zhihua Wen, Dongsheng Li
POMP involves constructing a directed acyclic meta-graph for each source language, from which we dynamically sample multiple paths to prompt LLMs to mitigate the linguistic noise and improve translations during training.
1 code implementation • 11 Dec 2023 • Anke Tang, Li Shen, Yong Luo, Liang Ding, Han Hu, Bo Du, DaCheng Tao
At the upper level, we focus on learning a shared Concrete mask to identify the subspace, while at the inner level, model merging is performed to maximize the performance of the merged model.
no code implementations • 9 Dec 2023 • Chuang Liu, Yibing Zhan, Xueqi Ma, Liang Ding, Dapeng Tao, Jia Wu, Wenbin Hu, Bo Du
Graph Transformers (GTs) have achieved impressive results on various graph-related tasks.
1 code implementation • 26 Nov 2023 • Lei Wang, Yibing Zhan, Leilei Ma, Dapeng Tao, Liang Ding, Chen Gong
The "splice" in our method is two-fold: 1) Each mixed image is a splice of several downsampled images in the form of a grid, where the semantics of images attending to mixing are blended without object deficiencies for alleviating co-occurred bias; 2) We splice mixed images and the original mini-batch to form a new SpliceMixed mini-batch, which allows an image with different scales to contribute to training together.
no code implementations • 20 Oct 2023 • Miaoxi Zhu, Qihuang Zhong, Li Shen, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao
The key algorithm in solving ZSAQ is the SAM-SGA optimization, which aims to improve the quantization accuracy and model generalization via optimizing a minimax problem.
1 code implementation • 15 Oct 2023 • Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, DaCheng Tao
Although a sparse Mixture of Experts (MoE) can reduce the cost by activating a small subset of parameters (e. g., one expert) for each input, its computation escalates significantly if increasing the number of activated experts, limiting its practical utility.
no code implementations • 15 Oct 2023 • Boan Liu, Liang Ding, Li Shen, Keqin Peng, Yu Cao, Dazhao Cheng, DaCheng Tao
The Mixture of Experts (MoE) has emerged as a highly successful technique in deep learning, based on the principle of divide-and-conquer to maximize model capacity without significant additional computational cost.
1 code implementation • 28 Sep 2023 • Changtong Zan, Liang Ding, Li Shen, Yibin Lei, Yibing Zhan, Weifeng Liu, DaCheng Tao
Zero-shot translation (ZST), which is generally based on a multilingual neural machine translation model, aims to translate between unseen language pairs in training data.
no code implementations • 27 Sep 2023 • Weishi Li, Yong Peng, Miao Zhang, Liang Ding, Han Hu, Li Shen
Specifically, we categorize existing deep model fusion methods as four-fold: (1) "Mode connectivity", which connects the solutions in weight space via a path of non-increasing loss, in order to obtain better initialization for model fusion; (2) "Alignment" matches units between neural networks to create better conditions for fusion; (3) "Weight average", a classical model fusion method, averages the weights of multiple models to obtain more accurate results closer to the optimal solution; (4) "Ensemble learning" combines the outputs of diverse models, which is a foundational technique for improving the accuracy and robustness of the final model.
no code implementations • 30 Aug 2023 • Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, DaCheng Tao
Adapter tuning, which updates only a few parameters, has become a mainstream method for fine-tuning pretrained language models to downstream tasks.
no code implementations • 29 Aug 2023 • Qingyue Wang, Liang Ding, Yanan Cao, Zhiliang Tian, Shi Wang, DaCheng Tao, Li Guo
We evaluate our method on both open and closed LLMs, and the experiments on the widely-used public dataset show that our method can generate more consistent responses in a long-context conversation.
1 code implementation • 24 Aug 2023 • Fei Wang, Liang Ding, Jun Rao, Ye Liu, Li Shen, Changxing Ding
The multimedia community has shown a significant interest in perceiving and representing the physical world with multimodal pretrained neural network models, and among them, the visual-language pertaining (VLP) is, currently, the most captivating topic.
no code implementations • 30 Jul 2023 • Yan Sun, Li Shen, Hao Sun, Liang Ding, DaCheng Tao
Adaptive optimization has achieved notable success for distributed learning while extending adaptive optimizer to federated Learning (FL) suffers from severe inefficiency, including (i) rugged convergence due to inaccurate gradient estimation in global adaptive optimizer; (ii) client drifts exacerbated by local over-fitting with the local adaptive optimizer.
no code implementations • 13 Jul 2023 • Haoran Wang, Qinghua Cheng, Baosheng Yu, Yibing Zhan, Dapeng Tao, Liang Ding, Haibin Ling
We evaluated our method on three popular egocentric action recognition datasets, Something-Something V2, H2O, and EPIC-KITCHENS-100, and the experimental results demonstrate the effectiveness of the proposed method for handling data scarcity problems, including long-tailed and few-shot egocentric action recognition.
1 code implementation • 5 Jun 2023 • Yibin Lei, Liang Ding, Yu Cao, Changtong Zan, Andrew Yates, DaCheng Tao
Dense retrievers have achieved impressive performance, but their demand for abundant training data limits their application scenarios.
no code implementations • 1 Jun 2023 • Qingyue Wang, Liang Ding, Yanan Cao, Yibing Zhan, Zheng Lin, Shi Wang, DaCheng Tao, Li Guo
Zero-shot transfer learning for Dialogue State Tracking (DST) helps to handle a variety of task-oriented dialogue domains without the cost of collecting in-domain data.
1 code implementation • 24 May 2023 • Qihuang Zhong, Liang Ding, Juhua Liu, Xuebo Liu, Min Zhang, Bo Du, DaCheng Tao
Token dropping is a recently-proposed strategy to speed up the pretraining of masked language models, such as BERT, by skipping the computation of a subset of the input tokens at several middle layers.
1 code implementation • 24 May 2023 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao
Masked language modeling, widely used in discriminative language model (e. g., BERT) pretraining, commonly adopts a random masking strategy.
no code implementations • 22 May 2023 • Haoqi Zheng, Qihuang Zhong, Liang Ding, Zhiliang Tian, Xin Niu, Dongsheng Li, DaCheng Tao
However, most of the mixup methods do not consider the varying degree of learning difficulty in different stages of training and generate new samples with one hot labels, resulting in the model over confidence.
1 code implementation • 19 May 2023 • Yan Sun, Li Shen, Shixiang Chen, Liang Ding, DaCheng Tao
In federated learning (FL), a cluster of local clients are chaired under the coordination of the global server and cooperatively train one model with privacy protection.
no code implementations • 5 May 2023 • Liang Ding, Tianyang Hu, Jiahang Jiang, Donghao Li, Wenjia Wang, Yuan YAO
In this paper, we aim to bridge this gap by presenting a framework for random smoothing regularization that can adaptively and effectively learn a wide range of ground truth functions belonging to the classical Sobolev spaces.
no code implementations • 29 Apr 2023 • Lu Zou, HaoYuan Chen, Liang Ding
We show how to use these sparse formulas to generalize back-fitting-based algorithms to efficiently compute the posterior mean, posterior variance, log-likelihood, and gradient of these three functions for additive GPs, all in $O(n \log n)$ time.
1 code implementation • 20 Apr 2023 • Chiaming Hsu, Changtong Zan, Liang Ding, Longyue Wang, Xiaoting Wang, Weifeng Liu, Fu Lin, Wenbin Hu
Experiments on WMT17-EnZh XRE also show the effectiveness of our Prompt-XRE against other competitive baselines.
no code implementations • 7 Apr 2023 • Li Shen, Yan Sun, Zhiyuan Yu, Liang Ding, Xinmei Tian, DaCheng Tao
The field of deep learning has witnessed significant progress, particularly in computer vision (CV), natural language processing (NLP), and speech.
1 code implementation • 24 Mar 2023 • Qingyu Lu, Baopu Qiu, Liang Ding, Kanjian Zhang, Tom Kocmi, DaCheng Tao
To further improve the performance of LLMs on MT quality assessment, we investigate several prompting designs, and propose a new prompting method called \textbf{\texttt{Error Analysis Prompting}} (EAPrompt) by combining Chain-of-Thoughts (Wei et al., 2022) and Error Analysis (Lu et al., 2023).
1 code implementation • 24 Mar 2023 • Keqin Peng, Liang Ding, Qihuang Zhong, Li Shen, Xuebo Liu, Min Zhang, Yuanxin Ouyang, DaCheng Tao
We show that: 1) The performance of ChatGPT depends largely on temperature, and a lower temperature usually can achieve better performance; 2) Emphasizing the task information can further improve ChatGPT's performance, particularly in complex MT tasks; 3) Introducing domain information can elicit ChatGPT's generalization ability and improve its performance in the specific domain; 4) ChatGPT tends to generate hallucinations for non-English-centric MT tasks, which can be partially addressed by our proposed prompts but still need to be highlighted for the MT/NLP community.
no code implementations • 1 Mar 2023 • Chao Xue, Wei Liu, Shuai Xie, Zhenfang Wang, Jiaxing Li, Xuyang Peng, Liang Ding, Shanshan Zhao, Qiong Cao, Yibo Yang, Fengxiang He, Bohua Cai, Rongcheng Bian, Yiyan Zhao, Heliang Zheng, Xiangyang Liu, Dongkai Liu, Daqing Liu, Li Shen, Chang Li, Shijin Zhang, Yukang Zhang, Guanpu Chen, Shixiang Chen, Yibing Zhan, Jing Zhang, Chaoyue Wang, DaCheng Tao
Automated machine learning (AutoML) seeks to build ML models with minimal human effort.
no code implementations • 1 Mar 2023 • Hao Sun, Li Shen, Qihuang Zhong, Liang Ding, Shixiang Chen, Jingwei Sun, Jing Li, Guangzhong Sun, DaCheng Tao
Integrating SAM with adaptive learning rate and momentum acceleration, dubbed AdaSAM, has already been explored empirically to train large-scale deep neural networks without theoretical guarantee due to the triple difficulties in analyzing the coupled perturbation step, adaptive learning rate and momentum step.
1 code implementation • 21 Feb 2023 • Yan Sun, Li Shen, Tiansheng Huang, Liang Ding, DaCheng Tao
Federated learning is an emerging distributed machine learning framework which jointly trains a global model via a large number of local devices with data privacy protections.
1 code implementation • 19 Feb 2023 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao
Recently, ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries.
no code implementations • 18 Feb 2023 • Qihuang Zhong, Liang Ding, Keqin Peng, Juhua Liu, Bo Du, Li Shen, Yibing Zhan, DaCheng Tao
This technical report briefly describes our JDExplore d-team's submission Vega v1 on the General Language Understanding Evaluation (GLUE) leaderboard, where GLUE is a collection of nine natural language understanding tasks, including question answering, linguistic acceptability, sentiment analysis, text similarity, paraphrase detection, and natural language inference.
no code implementations • 20 Dec 2022 • Baopu Qiu, Liang Ding, Di wu, Lin Shang, Yibing Zhan, DaCheng Tao
Machine Translation Quality Estimation (QE) is the task of evaluating translation output in the absence of human-written references.
1 code implementation • 20 Dec 2022 • Qingyu Lu, Liang Ding, Liping Xie, Kanjian Zhang, Derek F. Wong, DaCheng Tao
To this end, we augment BARTScore by incorporating the human-like error analysis strategies, namely BARTScore++, where the final score consists of both the evaluations of major errors and minor errors.
no code implementations • 4 Dec 2022 • Qihuang Zhong, Liang Ding, Yibing Zhan, Yu Qiao, Yonggang Wen, Li Shen, Juhua Liu, Baosheng Yu, Bo Du, Yixin Chen, Xinbo Gao, Chunyan Miao, Xiaoou Tang, DaCheng Tao
This technical report briefly describes our JDExplore d-team's Vega v2 submission on the SuperGLUE leaderboard.
Ranked #1 on Common Sense Reasoning on ReCoRD
1 code implementation • 2 Dec 2022 • Hexuan Deng, Liang Ding, Xuebo Liu, Meishan Zhang, DaCheng Tao, Min Zhang
Preliminary experiments on En-Zh and En-Ja news domain corpora demonstrate that monolingual data can significantly improve translation quality (e. g., +3. 15 BLEU on En-Zh).
1 code implementation • 10 Nov 2022 • Shwai He, Liang Ding, Daize Dong, Boan Liu, Fuqiang Yu, DaCheng Tao
The main contributions of our work are challenging the basic commonsense in dynamic networks and proposing a partially dynamic network, namely PAD-Net, to transform the redundant dynamic parameters into static ones.
1 code implementation • 11 Oct 2022 • Qihuang Zhong, Liang Ding, Li Shen, Peng Mi, Juhua Liu, Bo Du, DaCheng Tao
Fine-tuning large pretrained language models on a limited training corpus usually suffers from poor generalization.
1 code implementation • 9 Oct 2022 • Shwai He, Liang Ding, Daize Dong, Miao Zhang, DaCheng Tao
Adapter Tuning, which freezes the pretrained language models (PLMs) and only fine-tunes a few extra modules, becomes an appealing efficient alternative to the full model fine-tuning.
1 code implementation • 20 Sep 2022 • Changtong Zan, Keqin Peng, Liang Ding, Baopu Qiu, Boan Liu, Shwai He, Qingyu Lu, Zheng Zhang, Chuang Liu, Weifeng Liu, Yibing Zhan, DaCheng Tao
As for model sizes, we scale the Transformer-Big up to the extremely large model that owns nearly 4. 7 Billion parameters, to fully enhance the model capacity for our Vega-MT.
Ranked #1 on Machine Translation on WMT 2022 English-Russian
1 code implementation • COLING 2022 • Changtong Zan, Liang Ding, Li Shen, Yu Cao, Weifeng Liu, DaCheng Tao
Pre-Training (PT) of text representations has been successfully applied to low-resource Neural Machine Translation (NMT).
1 code implementation • 22 Aug 2022 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao
Prompt Transfer (PoT) is a recently-proposed approach to improve prompt-tuning, by initializing the target prompt with the existing prompt trained on similar source tasks.
no code implementations • 18 Jul 2022 • Chuang Liu, Xueqi Ma, Yibing Zhan, Liang Ding, Dapeng Tao, Bo Du, Wenbin Hu, Danilo Mandic
However, the LTH-based methods suffer from two major drawbacks: 1) they require exhaustive and iterative training of dense models, resulting in an extremely large training computation cost, and 2) they only trim graph structures and model parameters but ignore the node feature dimension, where significant redundancy exists.
no code implementations • 4 Jul 2022 • Jun Rao, Liang Ding, Shuhan Qi, Meng Fang, Yang Liu, Li Shen, DaCheng Tao
Although the vision-and-language pretraining (VLP) equipped cross-modal image-text retrieval (ITR) has achieved remarkable progress in the past two years, it suffers from a major drawback: the ever-increasing size of VLP models restricts its deployment to real-world search scenarios (where the high latency is unacceptable).
1 code implementation • 30 May 2022 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao
To verify our hypothesis, we first empirically study the functionalities of the encoder and decoder in seq2seq pretrained language models, and find that the encoder takes an important but under-exploitation role than the decoder regarding the downstream performance and neuron activation.
no code implementations • 28 May 2022 • Jun Rao, Xv Meng, Liang Ding, Shuhan Qi, DaCheng Tao
In this paper, we present a parameter-efficient and student-friendly knowledge distillation method, namely PESF-KD, to achieve efficient and sufficient knowledge transfer by updating relatively few partial parameters.
1 code implementation • NAACL 2022 • Hanhao Qu, Yu Cao, Jun Gao, Liang Ding, Ruifeng Xu
We present IBR, an Iterative Backward Reasoning model to solve the proof generation tasks on rule-based Question Answering (QA), where models are required to reason over a series of textual rules and facts to find out the related proof path and derive the final answer.
1 code implementation • COLING 2022 • Bing Wang, Liang Ding, Qihuang Zhong, Ximing Li, DaCheng Tao
Aspect-based sentiment analysis (ABSA) is a fine-grained sentiment analysis task, which focuses on detecting the sentiment polarity towards the aspect in a sentence.
Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +4
1 code implementation • 16 Apr 2022 • Changtong Zan, Liang Ding, Li Shen, Yu Cao, Weifeng Liu, DaCheng Tao
For multilingual sequence-to-sequence pretrained language models (multilingual Seq2Seq PLMs), e. g. mBART, the self-supervised pretraining task is trained on a wide range of monolingual languages, e. g. 25 languages from CommonCrawl, while the downstream cross-lingual tasks generally progress on a bilingual language subset, e. g. English-German, making there exists the data discrepancy, namely domain discrepancy, and cross-lingual learning objective discrepancy, namely task discrepancy, between the pretraining and finetuning stages.
no code implementations • 16 Apr 2022 • Zheng Zhang, Liang Ding, Dazhao Cheng, Xuebo Liu, Min Zhang, DaCheng Tao
Data augmentations (DA) are the cores to achieving robust sequence-to-sequence learning on various natural language processing (NLP) tasks.
no code implementations • 5 Apr 2022 • Shwai He, Chenbo Jiang, Daize Dong, Liang Ding
Dynamic convolution achieves better performance for efficient CNNs at the cost of negligible FLOPs increase.
1 code implementation • CVPR 2022 • Lin Zhang, Li Shen, Liang Ding, DaCheng Tao, Ling-Yu Duan
Instead, we propose a data-free knowledge distillation method to fine-tune the global model in the server (FedFTG), which relieves the issue of direct model aggregation.
1 code implementation • 8 Mar 2022 • Jun Rao, Fei Wang, Liang Ding, Shuhan Qi, Yibing Zhan, Weifeng Liu, DaCheng Tao
In contrast to previous works, we focus on the reproducibility of the approaches and the examination of the elements that lead to improved performance by pretrained and nonpretrained models in retrieving images and text.
no code implementations • 7 Mar 2022 • HaoYuan Chen, Liang Ding, Rui Tuo
We develop an exact and scalable algorithm for one-dimensional Gaussian process regression with Mat\'ern correlations whose smoothness parameter $\nu$ is a half-integer.
no code implementations • 19 Jan 2022 • Liang Ding, Keqin Peng, DaCheng Tao
We present a simple and effective pretraining strategy {D}en{o}ising {T}raining DoT for neural machine translation.
1 code implementation • 13 Jan 2022 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Hua Jin, DaCheng Tao
To this end, we propose a knowledge graph augmented network KGAN, which aims to effectively incorporate external knowledge with explicitly syntactic and contextual information.
Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2
no code implementations • 11 Dec 2021 • Liang Ding, Rui Tuo, Shahin Shahrampour
In this work, we use Deep Gaussian Processes (DGPs) as statistical surrogates for stochastic processes with complex distributions.
1 code implementation • 26 Oct 2021 • Juhua Liu, Qihuang Zhong, Liang Ding, Hua Jin, Bo Du, DaCheng Tao
In practice, we formulate the model pretrained on the sampled instances into a knowledge guidance model and a learner model, respectively.
Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2
1 code implementation • Findings (EMNLP) 2021 • Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Shuming Shi, Zhaopeng Tu
Pre-training (PT) and back-translation (BT) are two simple and powerful methods to utilize monolingual data for improving the model performance of neural machine translation (NMT).
no code implementations • 29 Sep 2021 • Lin Zhang, Li Shen, Liang Ding, DaCheng Tao, Lingyu Duan
On the contrary, we propose a new solution: on-the-fly fine-tuning the global model in server via data-free distillation to boost its performance, dubbed FLBoost to relieve the issue of direct model aggregation.
no code implementations • EMNLP 2021 • Liang Ding, Di wu, DaCheng Tao
We present a simple and effective pretraining strategy -- bidirectional training (BiT) for neural machine translation.
no code implementations • 24 Jul 2021 • Liang Ding, Di wu, DaCheng Tao
Our constrained system is based on a pipeline framework, i. e. ASR and NMT.
no code implementations • 19 Jul 2021 • Liang Ding, Rui Tuo, Xiaowei Zhang
High-dimensional simulation optimization is notoriously challenging.
1 code implementation • Findings (ACL) 2021 • Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Shuming Shi, Zhaopeng Tu
In response to this problem, we propose a simple and effective method named copying penalty to control the copying behaviors in decoding.
no code implementations • Findings (ACL) 2021 • Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, DaCheng Tao, Zhaopeng Tu
Non-autoregressive translation (NAT) significantly accelerates the inference process via predicting the entire target sequence.
1 code implementation • ACL 2021 • Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, DaCheng Tao, Zhaopeng Tu
Results demonstrate that the proposed approach can significantly and universally improve translation quality by reducing translation errors on low-frequency words.
no code implementations • ACL (IWSLT) 2021 • Lei Zhou, Liang Ding, Kevin Duh, Shinji Watanabe, Ryohei Sasano, Koichi Takeda
In the field of machine learning, the well-trained model is assumed to be able to recover the training labels, i. e. the synthetic labels predicted by the model should be as close to the ground-truth labels as possible.
no code implementations • 13 Apr 2021 • Di wu, Yiren Chen, Liang Ding, DaCheng Tao
Spoken language understanding (SLU) system usually consists of various pipeline components, where each component heavily relies on the results of its upstream ones.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +7
1 code implementation • 2 Mar 2021 • Yu Cao, Liang Ding, Zhiliang Tian, Meng Fang
Dialogue generation models face the challenge of producing generic and repetitive responses.
no code implementations • IWSLT (ACL) 2022 • Di wu, Liang Ding, Shuo Yang, Mingyang Li
Recently, the performance of the neural word alignment models has exceeded that of statistical models.
no code implementations • 1 Jan 2021 • Di wu, Liang Ding, Shuo Yang, DaCheng Tao
Recently, the performance of the neural word alignment models has exceeded that of statistical models.
no code implementations • ICLR 2021 • Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, DaCheng Tao, Zhaopeng Tu
To this end, we introduce an extra Kullback-Leibler divergence term derived by comparing the lexical choice of NAT model and that embedded in the raw data.
1 code implementation • ICLR 2021 • Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Zhaopeng Tu
Encoder layer fusion (EncoderFusion) is a technique to fuse all the encoder layers (instead of the uppermost layer) for sequence-to-sequence (Seq2Seq) models, which has proven effective on various NLP tasks.
1 code implementation • COLING 2020 • Liang Ding, Longyue Wang, Di wu, DaCheng Tao, Zhaopeng Tu
Non-autoregressive translation (NAT) significantly accelerates the inference process by predicting the entire target sequence.
no code implementations • 14 Oct 2020 • Liang Ding, Xiaowei Zhang
However, its use is limited to cases where the design space is low-dimensional because, in general, the sample complexity (i. e., the number of design points required for stochastic kriging to produce an accurate prediction) grows exponentially in the dimensionality of the design space.
no code implementations • WMT (EMNLP) 2020 • Lei Zhou, Liang Ding, Koichi Takeda
In response to this issue, we propose to expose explicit cross-lingual patterns, \textit{e. g.} word alignments and generation score, to our proposed zero-shot models.
1 code implementation • EMNLP 2020 • Di wu, Liang Ding, Fan Lu, Jian Xie
Slot filling and intent detection are two main tasks in spoken language understanding (SLU) system.
no code implementations • 5 Jun 2020 • Liang Ding, Lu Zou, Wenjia Wang, Shahin Shahrampour, Rui Tuo
Density estimation plays a key role in many tasks in machine learning, statistical inference, and visualization.
no code implementations • ACL 2020 • Liang Ding, Long-Yue Wang, DaCheng Tao
Position encoding (PE), an essential part of self-attention networks (SANs), is used to preserve the word order information for natural language processing tasks, generating fixed position indices for input sequences.
no code implementations • ICML 2020 • Liang Ding, Rui Tuo, Shahin Shahrampour
Despite their success, kernel methods suffer from a massive computational cost in practice.
no code implementations • 19 Aug 2019 • Liang Ding, DaCheng Tao
Syntax-incorporated machine translation models have been proven successful in improving the model's reasoning and meaning preservation ability.
no code implementations • WS 2019 • Liang Ding, DaCheng Tao
This paper describes the University of Sydney's submission of the WMT 2019 shared news translation task.
Ranked #1 on Machine Translation on WMT 2018 Finnish-English
1 code implementation • 21 Jan 2018 • Liang Ding, Di Chang, Russell Malmberg, Aaron Martinez, David Robinson, Matthew Wicker, Hongfei Yan, Liming Cai
The seminal work of Chow and Liu (1968) shows that approximation of a finite probabilistic system by Markov trees can achieve the minimum information loss with the topology of a maximum spanning tree.