no code implementations • WMT (EMNLP) 2020 • Longyue Wang, Zhaopeng Tu, Xing Wang, Li Ding, Liang Ding, Shuming Shi
This paper describes the Tencent AI Lab’s submission of the WMT 2020 shared task on chat translation in English-German.
1 code implementation • ACL 2022 • Liang Ding, Longyue Wang, Shuming Shi, DaCheng Tao, Zhaopeng Tu
In this work, we provide an appealing alternative for NAT – monolingual KD, which trains NAT student on external monolingual data with AT teacher trained on the original bilingual data.
no code implementations • ACL (IWSLT) 2021 • Liang Ding, DaCheng Tao
Our constrained system is based on a pipeline framework, i. e. ASR and NMT.
no code implementations • 18 Jul 2022 • Chuang Liu, Xueqi Ma, Yibing Zhan, Liang Ding, Dapeng Tao, Bo Du, Wenbin Hu, Danilo Mandic
However, the LTH-based methods suffer from two major drawbacks: 1) they require exhaustive and iterative training of dense models, resulting in an extremely large training computation cost, and 2) they only trim graph structures and model parameters but ignore the node feature dimension, where significant redundancy exists.
no code implementations • 4 Jul 2022 • Jun Rao, Liang Ding, Shuhan Qi, Meng Fang, Yang Liu, Li Shen, DaCheng Tao
Although the vision-and-language pretraining (VLP) equipped cross-modal image-text retrieval (ITR) has achieved remarkable progress in the past two years, it suffers from a major drawback: the ever-increasing size of VLP models restricts its deployment to real-world search scenarios (where the high latency is unacceptable).
no code implementations • 30 May 2022 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao
Sequence-to-sequence (seq2seq) learning has become a popular trend for pretraining language models, due to its succinct and universal framework.
no code implementations • 28 May 2022 • Jun Rao, Xv Meng, Liang Ding, Shuhan Qi, DaCheng Tao
In this paper, we present a parameter-efficient and student-friendly knowledge distillation method, namely PESF-KD, to achieve efficient and sufficient knowledge transfer by updating relatively few partial parameters.
1 code implementation • NAACL 2022 • Hanhao Qu, Yu Cao, Jun Gao, Liang Ding, Ruifeng Xu
We present IBR, an Iterative Backward Reasoning model to solve the proof generation tasks on rule-based Question Answering (QA), where models are required to reason over a series of textual rules and facts to find out the related proof path and derive the final answer.
no code implementations • 16 Apr 2022 • Changtong Zan, Liang Ding, Li Shen, Yu Cao, Weifeng Liu, DaCheng Tao
For multilingual sequence-to-sequence pretrained language models (multilingual Seq2Seq PLMs), e. g. mBART, the self-supervised pretraining task is trained on a wide range of monolingual languages, e. g. 25 languages from commoncrawl, while the downstream cross-lingual tasks generally progress on a bilingual language subset, e. g. English-German, making there exists the cross-lingual data discrepancy, namely \textit{domain discrepancy}, and cross-lingual learning objective discrepancy, namely \textit{task discrepancy}, between the pretrain and finetune stages.
no code implementations • 16 Apr 2022 • Zheng Zhang, Liang Ding, Dazhao Cheng, Xuebo Liu, Min Zhang, DaCheng Tao
Data augmentations (DA) are the cores to achieving robust sequence-to-sequence learning on various natural language processing (NLP) tasks.
no code implementations • 16 Apr 2022 • Bing Wang, Liang Ding, Qihuang Zhong, Ximing Li, DaCheng Tao
To mitigate this issue, we design a novel training framework, called Contrastive Cross-Channel Data Augmentation (C3DA).
no code implementations • CVPR 2022 • Lin Zhang, Li Shen, Liang Ding, DaCheng Tao, Ling-Yu Duan
Instead, we propose a data-free knowledge distillation method to fine-tune the global model in the server (FedFTG), which relieves the issue of direct model aggregation.
1 code implementation • 8 Mar 2022 • Jun Rao, Fei Wang, Liang Ding, Shuhan Qi, Yibing Zhan, Weifeng Liu, DaCheng Tao
In contrast to previous works, we focus on the reproducibility of the approaches and the examination of the elements that lead to improved performance by pretrained and nonpretrained models in retrieving images and text.
no code implementations • 7 Mar 2022 • HaoYuan Chen, Liang Ding, Rui Tuo
We develop an exact and scalable algorithm for one-dimensional Gaussian process regression with Mat\'ern correlations whose smoothness parameter $\nu$ is a half-integer.
no code implementations • 19 Jan 2022 • Liang Ding, Keqin Peng, DaCheng Tao
We present a simple and effective pretraining strategy {D}en{o}ising {T}raining DoT for neural machine translation.
1 code implementation • 13 Jan 2022 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Hua Jin, DaCheng Tao
To this end, we propose a knowledge graph augmented network (KGAN), which aims to effectively incorporate external knowledge with explicitly syntactic and contextual information.
no code implementations • 11 Dec 2021 • Liang Ding, Rui Tuo, Shahin Shahrampour
In this work, we propose an efficient scheme for accurate inference and prediction based on a range of Gaussian Processes, called the Tensor Markov Gaussian Processes (TMGP).
1 code implementation • 26 Oct 2021 • Juhua Liu, Qihuang Zhong, Liang Ding, Hua Jin, Bo Du, DaCheng Tao
In practice, we formulate the model pretrained on the sampled instances into a knowledge guidance model and a learner model, respectively.
1 code implementation • Findings (EMNLP) 2021 • Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Shuming Shi, Zhaopeng Tu
Pre-training (PT) and back-translation (BT) are two simple and powerful methods to utilize monolingual data for improving the model performance of neural machine translation (NMT).
no code implementations • 29 Sep 2021 • Lin Zhang, Li Shen, Liang Ding, DaCheng Tao, Lingyu Duan
On the contrary, we propose a new solution: on-the-fly fine-tuning the global model in server via data-free distillation to boost its performance, dubbed FLBoost to relieve the issue of direct model aggregation.
no code implementations • EMNLP 2021 • Liang Ding, Di wu, DaCheng Tao
We present a simple and effective pretraining strategy -- bidirectional training (BiT) for neural machine translation.
no code implementations • 24 Jul 2021 • Liang Ding, Di wu, DaCheng Tao
Our constrained system is based on a pipeline framework, i. e. ASR and NMT.
no code implementations • 19 Jul 2021 • Liang Ding, Rui Tuo, Xiaowei Zhang
High-dimensional simulation optimization is notoriously challenging.
1 code implementation • Findings (ACL) 2021 • Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Shuming Shi, Zhaopeng Tu
In response to this problem, we propose a simple and effective method named copying penalty to control the copying behaviors in decoding.
no code implementations • Findings (ACL) 2021 • Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, DaCheng Tao, Zhaopeng Tu
Non-autoregressive translation (NAT) significantly accelerates the inference process via predicting the entire target sequence.
1 code implementation • ACL 2021 • Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, DaCheng Tao, Zhaopeng Tu
Results demonstrate that the proposed approach can significantly and universally improve translation quality by reducing translation errors on low-frequency words.
no code implementations • ACL (IWSLT) 2021 • Lei Zhou, Liang Ding, Kevin Duh, Shinji Watanabe, Ryohei Sasano, Koichi Takeda
In the field of machine learning, the well-trained model is assumed to be able to recover the training labels, i. e. the synthetic labels predicted by the model should be as close to the ground-truth labels as possible.
no code implementations • 13 Apr 2021 • Di wu, Yiren Chen, Liang Ding, DaCheng Tao
Spoken language understanding (SLU) system usually consists of various pipeline components, where each component heavily relies on the results of its upstream ones.
1 code implementation • 2 Mar 2021 • Yu Cao, Liang Ding, Zhiliang Tian, Meng Fang
Dialogue generation models face the challenge of producing generic and repetitive responses.
no code implementations • IWSLT (ACL) 2022 • Di wu, Liang Ding, Shuo Yang, Mingyang Li
Recently, the performance of the neural word alignment models has exceeded that of statistical models.
no code implementations • 1 Jan 2021 • Di wu, Liang Ding, Shuo Yang, DaCheng Tao
Recently, the performance of the neural word alignment models has exceeded that of statistical models.
1 code implementation • ICLR 2021 • Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Zhaopeng Tu
Encoder layer fusion (EncoderFusion) is a technique to fuse all the encoder layers (instead of the uppermost layer) for sequence-to-sequence (Seq2Seq) models, which has proven effective on various NLP tasks.
no code implementations • ICLR 2021 • Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, DaCheng Tao, Zhaopeng Tu
To this end, we introduce an extra Kullback-Leibler divergence term derived by comparing the lexical choice of NAT model and that embedded in the raw data.
1 code implementation • COLING 2020 • Liang Ding, Longyue Wang, Di wu, DaCheng Tao, Zhaopeng Tu
Non-autoregressive translation (NAT) significantly accelerates the inference process by predicting the entire target sequence.
no code implementations • 14 Oct 2020 • Liang Ding, Xiaowei Zhang
However, its use is limited to cases where the design space is low-dimensional because, in general, the sample complexity (i. e., the number of design points required for stochastic kriging to produce an accurate prediction) grows exponentially in the dimensionality of the design space.
no code implementations • WMT (EMNLP) 2020 • Lei Zhou, Liang Ding, Koichi Takeda
In response to this issue, we propose to expose explicit cross-lingual patterns, \textit{e. g.} word alignments and generation score, to our proposed zero-shot models.
1 code implementation • EMNLP 2020 • Di wu, Liang Ding, Fan Lu, Jian Xie
Slot filling and intent detection are two main tasks in spoken language understanding (SLU) system.
no code implementations • 5 Jun 2020 • Liang Ding, Lu Zou, Wenjia Wang, Shahin Shahrampour, Rui Tuo
Density estimation plays a key role in many tasks in machine learning, statistical inference, and visualization.
no code implementations • ACL 2020 • Liang Ding, Long-Yue Wang, DaCheng Tao
Position encoding (PE), an essential part of self-attention networks (SANs), is used to preserve the word order information for natural language processing tasks, generating fixed position indices for input sequences.
no code implementations • ICML 2020 • Liang Ding, Rui Tuo, Shahin Shahrampour
Despite their success, kernel methods suffer from a massive computational cost in practice.
no code implementations • 19 Aug 2019 • Liang Ding, DaCheng Tao
Syntax-incorporated machine translation models have been proven successful in improving the model's reasoning and meaning preservation ability.
no code implementations • WS 2019 • Liang Ding, DaCheng Tao
This paper describes the University of Sydney's submission of the WMT 2019 shared news translation task.
Ranked #1 on
Machine Translation
on WMT2019 Finnish-English
1 code implementation • 21 Jan 2018 • Liang Ding, Di Chang, Russell Malmberg, Aaron Martinez, David Robinson, Matthew Wicker, Hongfei Yan, Liming Cai
The seminal work of Chow and Liu (1968) shows that approximation of a finite probabilistic system by Markov trees can achieve the minimum information loss with the topology of a maximum spanning tree.