Search Results for author: Liang Ding

Found 103 papers, 43 papers with code

Tencent AI Lab Machine Translation Systems for WMT20 Chat Translation Task

no code implementations • WMT (EMNLP) 2020 • Longyue Wang, Zhaopeng Tu, Xing Wang, Li Ding, Liang Ding, Shuming Shi

This paper describes the Tencent AI Lab’s submission of the WMT 2020 shared task on chat translation in English-German.

General Knowledge Machine Translation +3

Paper
Add Code

Redistributing Low-Frequency Words: Making the Most of Monolingual Data in Non-Autoregressive Translation

1 code implementation • ACL 2022 • Liang Ding, Longyue Wang, Shuming Shi, DaCheng Tao, Zhaopeng Tu

In this work, we provide an appealing alternative for NAT – monolingual KD, which trains NAT student on external monolingual data with AT teacher trained on the original bilingual data.

Knowledge Distillation Translation +1

Paper
Code

The USYD-JD Speech Translation System for IWSLT2021

no code implementations • ACL (IWSLT) 2021 • Liang Ding, DaCheng Tao

Our constrained system is based on a pipeline framework, i. e. ASR and NMT.

Knowledge Distillation NMT +1

Paper
Add Code

Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning

no code implementations • 2 May 2024 • Tianle Xia, Liang Ding, Guojia Wan, Yibing Zhan, Bo Du, DaCheng Tao

Specifically, we augment the arbitrary first-order logical queries via binary tree decomposition, to stimulate the reasoning capability of LLMs.

Knowledge Graphs Logical Reasoning +2

Paper
Add Code

3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset

1 code implementation • 29 Apr 2024 • Xinyu Ma, Xuebo Liu, Derek F. Wong, Jun Rao, Bei Li, Liang Ding, Lidia S. Chao, DaCheng Tao, Min Zhang

Experimental results show that MMT models trained on our dataset exhibit a greater ability to exploit visual information than those trained on other MMT datasets.

Multimodal Machine Translation Sentence +2

Paper
Code

Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Reasoners

no code implementations • 23 Apr 2024 • Qihuang Zhong, Kang Wang, Ziyang Xu, Juhua Liu, Liang Ding, Bo Du, DaCheng Tao

Chain of Thought prompting strategy has enhanced the performance of Large Language Models (LLMs) across various NLP tasks.

Ranked #1 on Math Word Problem Solving on SVAMP (Accuracy metric)

Arithmetic Reasoning GSM8K +1

Paper
Add Code

Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding

no code implementations • 27 Mar 2024 • Xintong Wang, Jingheng Pan, Liang Ding, Chris Biemann

Our method is inspired by our observation that what we call disturbance instructions significantly exacerbate hallucinations in multimodal fusion modules.

Attribute Decision Making

Paper
Add Code

Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning

no code implementations • 21 Mar 2024 • Changtong Zan, Liang Ding, Li Shen, Yibing Zhen, Weifeng Liu, DaCheng Tao

In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs.

In-Context Learning Instruction Following +1

Paper
Add Code

Kernel Multigrid: Accelerate Back-fitting via Sparse Gaussian Process Regression

no code implementations • 20 Mar 2024 • Lu Zou, Liang Ding

By utilizing a technique called Kernel Packets (KP), we prove that the convergence rate of Back-fitting is no faster than $(1-\mathcal{O}(\frac{1}{n}))^t$, where $n$ and $t$ denote the data size and the iteration number, respectively.

feature selection GPR +1

Paper
Add Code

Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction

1 code implementation • 15 Mar 2024 • Ziyang Xu, Keqin Peng, Liang Ding, DaCheng Tao, Xiliang Lu

Experiments across various prompts, PLMs, and benchmarks show that our approach can not only correct the overfitted performance caused by prompt bias, but also significantly improve the prompt retrieval capability (up to 10% absolute performance gain).

Paper
Code

Towards Training A Chinese Large Language Model for Anesthesiology

no code implementations • 5 Mar 2024 • Zhonghai Wang, Jie Jiang, Yibing Zhan, Bohao Zhou, Yanhong Li, Chong Zhang, Liang Ding, Hua Jin, Jun Peng, Xu Lin, Weifeng Liu

3) We introduce a standardized benchmark for evaluating medical LLM in Anesthesiology.

Language Modelling Large Language Model

Paper
Add Code

Healthcare Copilot: Eliciting the Power of General LLMs for Medical Consultation

no code implementations • 20 Feb 2024 • Zhiyao Ren, Yibing Zhan, Baosheng Yu, Liang Ding, DaCheng Tao

The copilot framework, which aims to enhance and tailor large language models (LLMs) for specific complex tasks without requiring fine-tuning, is gaining increasing attention from the community.

Paper
Add Code

ROSE Doesn't Do That: Boosting the Safety of Instruction-Tuned Large Language Models with Reverse Prompt Contrastive Decoding

no code implementations • 19 Feb 2024 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao

With the development of instruction-tuned large language models (LLMs), improving the safety of LLMs has become more critical.

Paper
Add Code

Revisiting Knowledge Distillation for Autoregressive Language Models

no code implementations • 19 Feb 2024 • Qihuang Zhong, Liang Ding, Li Shen, Juhua Liu, Bo Du, DaCheng Tao

Knowledge distillation (KD) is a common approach to compress a teacher model to reduce its inference cost and memory footprint, by training a smaller student model.

Knowledge Distillation

Paper
Add Code

DB-LLM: Accurate Dual-Binarization for Efficient LLMs

no code implementations • 19 Feb 2024 • Hong Chen, Chengtao Lv, Liang Ding, Haotong Qin, Xiabin Zhou, Yifu Ding, Xuebo Liu, Min Zhang, Jinyang Guo, Xianglong Liu, DaCheng Tao

Large language models (LLMs) have significantly advanced the field of natural language processing, while the expensive memory and computation consumption impede their practical deployment.

Binarization Computational Efficiency +1

Paper
Add Code

Mitigating Reward Hacking via Information-Theoretic Reward Modeling

no code implementations • 14 Feb 2024 • Yuchun Miao, Sen Zhang, Liang Ding, Rong Bao, Lefei Zhang, DaCheng Tao

Inspired by this finding, we propose the Integrated Cluster Deviation Score (ICDS), which quantifies deviations in the latent space, as an indicator of reward overoptimization to facilitate the development of online mitigation strategies.

Paper
Add Code

A General Theory for Kernel Packets: from state space model to compactly supported basis

no code implementations • 6 Feb 2024 • Liang Ding, Rui Tuo

It is well known that the state space (SS) model formulation of a Gaussian process (GP) can lower its training and prediction time both to $\CalO(n)$ for $n$ data points.

Paper
Add Code

Revisiting Demonstration Selection Strategies in In-Context Learning

no code implementations • 22 Jan 2024 • Keqin Peng, Liang Ding, Yancheng Yuan, Xuebo Liu, Min Zhang, Yuanxin Ouyang, DaCheng Tao

In this work, we first revisit the factors contributing to this variance from both data and model aspects, and find that the choice of demonstration is both data- and model-dependent.

In-Context Learning

Paper
Add Code

OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models

1 code implementation • 12 Jan 2024 • Shuai Wang, Liang Ding, Li Shen, Yong Luo, Bo Du, DaCheng Tao

Advancing automated programming necessitates robust and comprehensive code generation benchmarks, yet current evaluation frameworks largely neglect object-oriented programming (OOP) in favor of functional programming (FP), e. g., HumanEval and MBPP.

Code Generation

Paper
Code

Intention Analysis Makes LLMs A Good Jailbreak Defender

1 code implementation • 12 Jan 2024 • Yuqi Zhang, Liang Ding, Lefei Zhang, DaCheng Tao

Aligning large language models (LLMs) with human values, particularly in the face of complex and stealthy jailbreak attacks, presents a formidable challenge.

Paper
Code

WisdoM: Improving Multimodal Sentiment Analysis by Fusing Contextual World Knowledge

no code implementations • 12 Jan 2024 • Wenbin Wang, Liang Ding, Li Shen, Yong Luo, Han Hu, DaCheng Tao

Sentiment analysis is rapidly advancing by utilizing various data modalities (e. g., text, image).

Multimodal Sentiment Analysis World Knowledge

Paper
Add Code

POMP: Probability-driven Meta-graph Prompter for LLMs in Low-resource Unsupervised Neural Machine Translation

no code implementations • 11 Jan 2024 • Shilong Pan, Zhiliang Tian, Liang Ding, Zhen Huang, Zhihua Wen, Dongsheng Li

POMP involves constructing a directed acyclic meta-graph for each source language, from which we dynamically sample multiple paths to prompt LLMs to mitigate the linguistic noise and improve translations during training.

In-Context Learning Machine Translation +3

Paper
Add Code

Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion

1 code implementation • 11 Dec 2023 • Anke Tang, Li Shen, Yong Luo, Liang Ding, Han Hu, Bo Du, DaCheng Tao

At the upper level, we focus on learning a shared Concrete mask to identify the subspace, while at the inner level, model merging is performed to maximize the performance of the merged model.

Meta-Learning

Paper
Code

Exploring Sparsity in Graph Transformers

no code implementations • 9 Dec 2023 • Chuang Liu, Yibing Zhan, Xueqi Ma, Liang Ding, Dapeng Tao, Jia Wu, Wenbin Hu, Bo Du

Graph Transformers (GTs) have achieved impressive results on various graph-related tasks.

Paper
Add Code

SpliceMix: A Cross-scale and Semantic Blending Augmentation Strategy for Multi-label Image Classification

1 code implementation • 26 Nov 2023 • Lei Wang, Yibing Zhan, Leilei Ma, Dapeng Tao, Liang Ding, Chen Gong

The "splice" in our method is two-fold: 1) Each mixed image is a splice of several downsampled images in the form of a grid, where the semantics of images attending to mixing are blended without object deficiencies for alleviating co-occurred bias; 2) We splice mixed images and the original mini-batch to form a new SpliceMixed mini-batch, which allows an image with different scales to contribute to training together.

Data Augmentation Multi-Label Image Classification

Paper
Code

Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models

no code implementations • 20 Oct 2023 • Miaoxi Zhu, Qihuang Zhong, Li Shen, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao

The key algorithm in solving ZSAQ is the SAM-SGA optimization, which aims to improve the quantization accuracy and model generalization via optimizing a minimax problem.

Language Modelling Quantization

Paper
Add Code

Merging Experts into One: Improving Computational Efficiency of Mixture of Experts

1 code implementation • 15 Oct 2023 • Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, DaCheng Tao

Although a sparse Mixture of Experts (MoE) can reduce the cost by activating a small subset of parameters (e. g., one expert) for each input, its computation escalates significantly if increasing the number of activated experts, limiting its practical utility.

Computational Efficiency

Paper
Code

Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer

no code implementations • 15 Oct 2023 • Boan Liu, Liang Ding, Li Shen, Keqin Peng, Yu Cao, Dazhao Cheng, DaCheng Tao

The Mixture of Experts (MoE) has emerged as a highly successful technique in deep learning, based on the principle of divide-and-conquer to maximize model capacity without significant additional computational cost.

Question Answering

Paper
Add Code

Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot Translation

1 code implementation • 28 Sep 2023 • Changtong Zan, Liang Ding, Li Shen, Yibin Lei, Yibing Zhan, Weifeng Liu, DaCheng Tao

Zero-shot translation (ZST), which is generally based on a multilingual neural machine translation model, aims to translate between unseen language pairs in training data.

Machine Translation Navigate +2

Paper
Code

Deep Model Fusion: A Survey

no code implementations • 27 Sep 2023 • Weishi Li, Yong Peng, Miao Zhang, Liang Ding, Han Hu, Li Shen

Specifically, we categorize existing deep model fusion methods as four-fold: (1) "Mode connectivity", which connects the solutions in weight space via a path of non-increasing loss, in order to obtain better initialization for model fusion; (2) "Alignment" matches units between neural networks to create better conditions for fusion; (3) "Weight average", a classical model fusion method, averages the weights of multiple models to obtain more accurate results closer to the optimal solution; (4) "Ensemble learning" combines the outputs of diverse models, which is a foundational technique for improving the accuracy and robustness of the final model.

Ensemble Learning

Paper
Add Code

MerA: Merging Pretrained Adapters For Few-Shot Learning

no code implementations • 30 Aug 2023 • Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, DaCheng Tao

Adapter tuning, which updates only a few parameters, has become a mainstream method for fine-tuning pretrained language models to downstream tasks.

Few-Shot Learning MRPC

Paper
Add Code

Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models

no code implementations • 29 Aug 2023 • Qingyue Wang, Liang Ding, Yanan Cao, Zhiliang Tian, Shi Wang, DaCheng Tao, Li Guo

We evaluate our method on both open and closed LLMs, and the experiments on the widely-used public dataset show that our method can generate more consistent responses in a long-context conversation.

16k 8k +1

Paper
Add Code

Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining?

1 code implementation • 24 Aug 2023 • Fei Wang, Liang Ding, Jun Rao, Ye Liu, Li Shen, Changxing Ding

The multimedia community has shown a significant interest in perceiving and representing the physical world with multimodal pretrained neural network models, and among them, the visual-language pertaining (VLP) is, currently, the most captivating topic.

Attribute Negation +1

Paper
Code

Efficient Federated Learning via Local Adaptive Amended Optimizer with Linear Speedup

no code implementations • 30 Jul 2023 • Yan Sun, Li Shen, Hao Sun, Liang Ding, DaCheng Tao

Adaptive optimization has achieved notable success for distributed learning while extending adaptive optimizer to federated Learning (FL) suffers from severe inefficiency, including (i) rugged convergence due to inaccurate gradient estimation in global adaptive optimizer; (ii) client drifts exacerbated by local over-fitting with the local adaptive optimizer.

Federated Learning

Paper
Add Code

Free-Form Composition Networks for Egocentric Action Recognition

no code implementations • 13 Jul 2023 • Haoran Wang, Qinghua Cheng, Baosheng Yu, Yibing Zhan, Dapeng Tao, Liang Ding, Haibin Ling

We evaluated our method on three popular egocentric action recognition datasets, Something-Something V2, H2O, and EPIC-KITCHENS-100, and the experimental results demonstrate the effectiveness of the proposed method for handling data scarcity problems, including long-tailed and few-shot egocentric action recognition.

Action Recognition Temporal Action Localization

Paper
Add Code

Unsupervised Dense Retrieval with Relevance-Aware Contrastive Pre-Training

1 code implementation • 5 Jun 2023 • Yibin Lei, Liang Ding, Yu Cao, Changtong Zan, Andrew Yates, DaCheng Tao

Dense retrievers have achieved impressive performance, but their demand for abundant training data limits their application scenarios.

Contrastive Learning Retrieval

Paper
Code

Divide, Conquer, and Combine: Mixture of Semantic-Independent Experts for Zero-Shot Dialogue State Tracking

no code implementations • 1 Jun 2023 • Qingyue Wang, Liang Ding, Yanan Cao, Yibing Zhan, Zheng Lin, Shi Wang, DaCheng Tao, Li Guo

Zero-shot transfer learning for Dialogue State Tracking (DST) helps to handle a variety of task-oriented dialogue domains without the cost of collecting in-domain data.

Dialogue State Tracking Transfer Learning

Paper
Add Code

Revisiting Token Dropping Strategy in Efficient BERT Pretraining

1 code implementation • 24 May 2023 • Qihuang Zhong, Liang Ding, Juhua Liu, Xuebo Liu, Min Zhang, Bo Du, DaCheng Tao

Token dropping is a recently-proposed strategy to speed up the pretraining of masked language models, such as BERT, by skipping the computation of a subset of the input tokens at several middle layers.

Paper
Code

Self-Evolution Learning for Discriminative Language Model Pretraining

1 code implementation • 24 May 2023 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao

Masked language modeling, widely used in discriminative language model (e. g., BERT) pretraining, commonly adopts a random masking strategy.

Language Modelling Masked Language Modeling +1

Paper
Code

Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks

no code implementations • 22 May 2023 • Haoqi Zheng, Qihuang Zhong, Liang Ding, Zhiliang Tian, Xin Niu, Dongsheng Li, DaCheng Tao

However, most of the mixup methods do not consider the varying degree of learning difficulty in different stages of training and generate new samples with one hot labels, resulting in the model over confidence.

Data Augmentation Few-Shot Text Classification +1

Paper
Add Code

Dynamic Regularized Sharpness Aware Minimization in Federated Learning: Approaching Global Consistency and Smooth Landscape

1 code implementation • 19 May 2023 • Yan Sun, Li Shen, Shixiang Chen, Liang Ding, DaCheng Tao

In federated learning (FL), a cluster of local clients are chaired under the coordination of the global server and cooperatively train one model with privacy protection.

Federated Learning

Paper
Code

Random Smoothing Regularization in Kernel Gradient Descent Learning

no code implementations • 5 May 2023 • Liang Ding, Tianyang Hu, Jiahang Jiang, Donghao Li, Wenjia Wang, Yuan YAO

In this paper, we aim to bridge this gap by presenting a framework for random smoothing regularization that can adaptively and effectively learn a wide range of ground truth functions belonging to the classical Sobolev spaces.

Data Augmentation

Paper
Add Code

Representing Additive Gaussian Processes by Sparse Matrices

no code implementations • 29 Apr 2023 • Lu Zou, HaoYuan Chen, Liang Ding

We show how to use these sparse formulas to generalize back-fitting-based algorithms to efficiently compute the posterior mean, posterior variance, log-likelihood, and gradient of these three functions for additive GPs, all in $O(n \log n)$ time.

Additive models Bayesian Optimization +1

Paper
Add Code

Prompt-Learning for Cross-Lingual Relation Extraction

1 code implementation • 20 Apr 2023 • Chiaming Hsu, Changtong Zan, Liang Ding, Longyue Wang, Xiaoting Wang, Weifeng Liu, Fu Lin, Wenbin Hu

Experiments on WMT17-EnZh XRE also show the effectiveness of our Prompt-XRE against other competitive baselines.

Relation Relation Extraction +1

Paper
Code

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

no code implementations • 7 Apr 2023 • Li Shen, Yan Sun, Zhiyuan Yu, Liang Ding, Xinmei Tian, DaCheng Tao

The field of deep learning has witnessed significant progress, particularly in computer vision (CV), natural language processing (NLP), and speech.

Paper
Add Code

Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models

1 code implementation • 24 Mar 2023 • Qingyu Lu, Baopu Qiu, Liang Ding, Kanjian Zhang, Tom Kocmi, DaCheng Tao

To further improve the performance of LLMs on MT quality assessment, we investigate several prompting designs, and propose a new prompting method called \textbf{\texttt{Error Analysis Prompting}} (EAPrompt) by combining Chain-of-Thoughts (Wei et al., 2022) and Error Analysis (Lu et al., 2023).

Machine Translation Natural Language Understanding +3

Paper
Code

Towards Making the Most of ChatGPT for Machine Translation

1 code implementation • 24 Mar 2023 • Keqin Peng, Liang Ding, Qihuang Zhong, Li Shen, Xuebo Liu, Min Zhang, Yuanxin Ouyang, DaCheng Tao

We show that: 1) The performance of ChatGPT depends largely on temperature, and a lower temperature usually can achieve better performance; 2) Emphasizing the task information can further improve ChatGPT's performance, particularly in complex MT tasks; 3) Introducing domain information can elicit ChatGPT's generalization ability and improve its performance in the specific domain; 4) ChatGPT tends to generate hallucinations for non-English-centric MT tasks, which can be partially addressed by our proposed prompts but still need to be highlighted for the MT/NLP community.

In-Context Learning Machine Translation +2

Paper
Code

OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge Collaborative AutoML System

no code implementations • 1 Mar 2023 • Chao Xue, Wei Liu, Shuai Xie, Zhenfang Wang, Jiaxing Li, Xuyang Peng, Liang Ding, Shanshan Zhao, Qiong Cao, Yibo Yang, Fengxiang He, Bohua Cai, Rongcheng Bian, Yiyan Zhao, Heliang Zheng, Xiangyang Liu, Dongkai Liu, Daqing Liu, Li Shen, Chang Li, Shijin Zhang, Yukang Zhang, Guanpu Chen, Shixiang Chen, Yibing Zhan, Jing Zhang, Chaoyue Wang, DaCheng Tao

Automated machine learning (AutoML) seeks to build ML models with minimal human effort.

AutoML

Paper
Add Code

AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks

no code implementations • 1 Mar 2023 • Hao Sun, Li Shen, Qihuang Zhong, Liang Ding, Shixiang Chen, Jingwei Sun, Jing Li, Guangzhong Sun, DaCheng Tao

Integrating SAM with adaptive learning rate and momentum acceleration, dubbed AdaSAM, has already been explored empirically to train large-scale deep neural networks without theoretical guarantee due to the triple difficulties in analyzing the coupled perturbation step, adaptive learning rate and momentum step.

Paper
Add Code

FedSpeed: Larger Local Interval, Less Communication Round, and Higher Generalization Accuracy

1 code implementation • 21 Feb 2023 • Yan Sun, Li Shen, Tiansheng Huang, Liang Ding, DaCheng Tao

Federated learning is an emerging distributed machine learning framework which jointly trains a global model via a large number of local devices with data privacy protections.

Federated Learning

Paper
Code

Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT

1 code implementation • 19 Feb 2023 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao

Recently, ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries.

Question Answering Sentiment Analysis

192

Paper
Code

Bag of Tricks for Effective Language Model Pretraining and Downstream Adaptation: A Case Study on GLUE

no code implementations • 18 Feb 2023 • Qihuang Zhong, Liang Ding, Keqin Peng, Juhua Liu, Bo Du, Li Shen, Yibing Zhan, DaCheng Tao

This technical report briefly describes our JDExplore d-team's submission Vega v1 on the General Language Understanding Evaluation (GLUE) leaderboard, where GLUE is a collection of nine natural language understanding tasks, including question answering, linguistic acceptability, sentiment analysis, text similarity, paraphrase detection, and natural language inference.

Contrastive Learning Denoising +12

Paper
Add Code

Original or Translated? On the Use of Parallel Data for Translation Quality Estimation

no code implementations • 20 Dec 2022 • Baopu Qiu, Liang Ding, Di wu, Lin Shang, Yibing Zhan, DaCheng Tao

Machine Translation Quality Estimation (QE) is the task of evaluating translation output in the absence of human-written references.

Data Augmentation Machine Translation +2

Paper
Add Code

Toward Human-Like Evaluation for Natural Language Generation with Error Analysis

1 code implementation • 20 Dec 2022 • Qingyu Lu, Liang Ding, Liping Xie, Kanjian Zhang, Derek F. Wong, DaCheng Tao

To this end, we augment BARTScore by incorporating the human-like error analysis strategies, namely BARTScore++, where the final score consists of both the evaluations of major errors and minor errors.

Language Modelling Machine Translation +2

Paper
Code

Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE

no code implementations • 4 Dec 2022 • Qihuang Zhong, Liang Ding, Yibing Zhan, Yu Qiao, Yonggang Wen, Li Shen, Juhua Liu, Baosheng Yu, Bo Du, Yixin Chen, Xinbo Gao, Chunyan Miao, Xiaoou Tang, DaCheng Tao

This technical report briefly describes our JDExplore d-team's Vega v2 submission on the SuperGLUE leaderboard.

Ranked #1 on Common Sense Reasoning on ReCoRD

Common Sense Reasoning coreference-resolution +5

Paper
Add Code

Improving Simultaneous Machine Translation with Monolingual Data

1 code implementation • 2 Dec 2022 • Hexuan Deng, Liang Ding, Xuebo Liu, Meishan Zhang, DaCheng Tao, Min Zhang

Preliminary experiments on En-Zh and En-Ja news domain corpora demonstrate that monolingual data can significantly improve translation quality (e. g., +3. 15 BLEU on En-Zh).

Hallucination Knowledge Distillation +4

Paper
Code

PAD-Net: An Efficient Framework for Dynamic Networks

1 code implementation • 10 Nov 2022 • Shwai He, Liang Ding, Daize Dong, Boan Liu, Fuqiang Yu, DaCheng Tao

The main contributions of our work are challenging the basic commonsense in dynamic networks and proposing a partially dynamic network, namely PAD-Net, to transform the redundant dynamic parameters into static ones.

Image Classification

Paper
Code

Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models

1 code implementation • 11 Oct 2022 • Qihuang Zhong, Liang Ding, Li Shen, Peng Mi, Juhua Liu, Bo Du, DaCheng Tao

Fine-tuning large pretrained language models on a limited training corpus usually suffers from poor generalization.

Paper
Code

SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters

1 code implementation • 9 Oct 2022 • Shwai He, Liang Ding, Daize Dong, Miao Zhang, DaCheng Tao

Adapter Tuning, which freezes the pretrained language models (PLMs) and only fine-tunes a few extra modules, becomes an appealing efficient alternative to the full model fine-tuning.

Network Pruning

Paper
Code

Vega-MT: The JD Explore Academy Translation System for WMT22

1 code implementation • 20 Sep 2022 • Changtong Zan, Keqin Peng, Liang Ding, Baopu Qiu, Boan Liu, Shwai He, Qingyu Lu, Zheng Zhang, Chuang Liu, Weifeng Liu, Yibing Zhan, DaCheng Tao

As for model sizes, we scale the Transformer-Big up to the extremely large model that owns nearly 4. 7 Billion parameters, to fully enhance the model capacity for our Vega-MT.

Ranked #1 on Machine Translation on WMT 2022 English-Russian

Data Augmentation Machine Translation +1

Paper
Code

On the Complementarity between Pre-Training and Random-Initialization for Resource-Rich Machine Translation

1 code implementation • COLING 2022 • Changtong Zan, Liang Ding, Li Shen, Yu Cao, Weifeng Liu, DaCheng Tao

Pre-Training (PT) of text representations has been successfully applied to low-resource Neural Machine Translation (NMT).

Low-Resource Neural Machine Translation NMT +1

Paper
Code

PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation

1 code implementation • 22 Aug 2022 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao

Prompt Transfer (PoT) is a recently-proposed approach to improve prompt-tuning, by initializing the target prompt with the existing prompt trained on similar source tasks.

General Knowledge Knowledge Distillation +1

Paper
Code

Comprehensive Graph Gradual Pruning for Sparse Training in Graph Neural Networks

no code implementations • 18 Jul 2022 • Chuang Liu, Xueqi Ma, Yibing Zhan, Liang Ding, Dapeng Tao, Bo Du, Wenbin Hu, Danilo Mandic

However, the LTH-based methods suffer from two major drawbacks: 1) they require exhaustive and iterative training of dense models, resulting in an extremely large training computation cost, and 2) they only trim graph structures and model parameters but ignore the node feature dimension, where significant redundancy exists.

Node Classification

Paper
Add Code

Dynamic Contrastive Distillation for Image-Text Retrieval

no code implementations • 4 Jul 2022 • Jun Rao, Liang Ding, Shuhan Qi, Meng Fang, Yang Liu, Li Shen, DaCheng Tao

Although the vision-and-language pretraining (VLP) equipped cross-modal image-text retrieval (ITR) has achieved remarkable progress in the past two years, it suffers from a major drawback: the ever-increasing size of VLP models restricts its deployment to real-world search scenarios (where the high latency is unacceptable).

Contrastive Learning Metric Learning +3

Paper
Add Code

E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation

1 code implementation • 30 May 2022 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao

To verify our hypothesis, we first empirically study the functionalities of the encoder and decoder in seq2seq pretrained language models, and find that the encoder takes an important but under-exploitation role than the decoder regarding the downstream performance and neuron activation.

Decoder Denoising +3

Paper
Code

Parameter-Efficient and Student-Friendly Knowledge Distillation

no code implementations • 28 May 2022 • Jun Rao, Xv Meng, Liang Ding, Shuhan Qi, DaCheng Tao

In this paper, we present a parameter-efficient and student-friendly knowledge distillation method, namely PESF-KD, to achieve efficient and sufficient knowledge transfer by updating relatively few partial parameters.

Knowledge Distillation Transfer Learning

Paper
Add Code

Interpretable Proof Generation via Iterative Backward Reasoning

1 code implementation • NAACL 2022 • Hanhao Qu, Yu Cao, Jun Gao, Liang Ding, Ruifeng Xu

We present IBR, an Iterative Backward Reasoning model to solve the proof generation tasks on rule-based Question Answering (QA), where models are required to reason over a series of textual rules and facts to find out the related proof path and derive the final answer.

Question Answering

Paper
Code

A Contrastive Cross-Channel Data Augmentation Framework for Aspect-based Sentiment Analysis

1 code implementation • COLING 2022 • Bing Wang, Liang Ding, Qihuang Zhong, Ximing Li, DaCheng Tao

Aspect-based sentiment analysis (ABSA) is a fine-grained sentiment analysis task, which focuses on detecting the sentiment polarity towards the aspect in a sentence.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +4

Paper
Code

Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation and Understanding

1 code implementation • 16 Apr 2022 • Changtong Zan, Liang Ding, Li Shen, Yu Cao, Weifeng Liu, DaCheng Tao

For multilingual sequence-to-sequence pretrained language models (multilingual Seq2Seq PLMs), e. g. mBART, the self-supervised pretraining task is trained on a wide range of monolingual languages, e. g. 25 languages from CommonCrawl, while the downstream cross-lingual tasks generally progress on a bilingual language subset, e. g. English-German, making there exists the data discrepancy, namely domain discrepancy, and cross-lingual learning objective discrepancy, namely task discrepancy, between the pretraining and finetuning stages.

Cross-Lingual Natural Language Inference nlg evaluation +4

Paper
Code

BLISS: Robust Sequence-to-Sequence Learning via Self-Supervised Input Representation

no code implementations • 16 Apr 2022 • Zheng Zhang, Liang Ding, Dazhao Cheng, Xuebo Liu, Min Zhang, DaCheng Tao

Data augmentations (DA) are the cores to achieving robust sequence-to-sequence learning on various natural language processing (NLP) tasks.

Decoder Grammatical Error Correction +2

Paper
Add Code

SD-Conv: Towards the Parameter-Efficiency of Dynamic Convolution

no code implementations • 5 Apr 2022 • Shwai He, Chenbo Jiang, Daize Dong, Liang Ding

Dynamic convolution achieves better performance for efficient CNNs at the cost of negligible FLOPs increase.

Paper
Add Code

Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning

1 code implementation • CVPR 2022 • Lin Zhang, Li Shen, Liang Ding, DaCheng Tao, Ling-Yu Duan

Instead, we propose a data-free knowledge distillation method to fine-tune the global model in the server (FedFTG), which relieves the issue of direct model aggregation.

Data-free Knowledge Distillation Federated Learning

Paper
Code

Where Does the Performance Improvement Come From? -- A Reproducibility Concern about Image-Text Retrieval

1 code implementation • 8 Mar 2022 • Jun Rao, Fei Wang, Liang Ding, Shuhan Qi, Yibing Zhan, Weifeng Liu, DaCheng Tao

In contrast to previous works, we focus on the reproducibility of the approaches and the examination of the elements that lead to improved performance by pretrained and nonpretrained models in retrieving images and text.

Information Retrieval Retrieval +1

Paper
Code

Kernel Packet: An Exact and Scalable Algorithm for Gaussian Process Regression with Matérn Correlations

no code implementations • 7 Mar 2022 • HaoYuan Chen, Liang Ding, Rui Tuo

We develop an exact and scalable algorithm for one-dimensional Gaussian process regression with Mat\'ern correlations whose smoothness parameter $\nu$ is a half-integer.

regression

Paper
Add Code

Improving Neural Machine Translation by Denoising Training

no code implementations • 19 Jan 2022 • Liang Ding, Keqin Peng, DaCheng Tao

We present a simple and effective pretraining strategy {D}en{o}ising {T}raining DoT for neural machine translation.

Denoising Knowledge Distillation +2

Paper
Add Code

Knowledge Graph Augmented Network Towards Multiview Representation Learning for Aspect-based Sentiment Analysis

1 code implementation • 13 Jan 2022 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Hua Jin, DaCheng Tao

To this end, we propose a knowledge graph augmented network KGAN, which aims to effectively incorporate external knowledge with explicitly syntactic and contextual information.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2

Paper
Code

A Sparse Expansion For Deep Gaussian Processes

no code implementations • 11 Dec 2021 • Liang Ding, Rui Tuo, Shahin Shahrampour

In this work, we use Deep Gaussian Processes (DGPs) as statistical surrogates for stochastic processes with complex distributions.

Computational Efficiency Gaussian Processes

Paper
Add Code

Unified Instance and Knowledge Alignment Pretraining for Aspect-based Sentiment Analysis

1 code implementation • 26 Oct 2021 • Juhua Liu, Qihuang Zhong, Liang Ding, Hua Jin, Bo Du, DaCheng Tao

In practice, we formulate the model pretrained on the sampled instances into a knowledge guidance model and a learner model, respectively.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2

Paper
Code

On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation

1 code implementation • Findings (EMNLP) 2021 • Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Shuming Shi, Zhaopeng Tu

Pre-training (PT) and back-translation (BT) are two simple and powerful methods to utilize monolingual data for improving the model performance of neural machine translation (NMT).

Decoder Machine Translation +3

Paper
Code

FLBoost: On-the-Fly Fine-tuning Boosts Federated Learning via Data-free Distillation

no code implementations • 29 Sep 2021 • Lin Zhang, Li Shen, Liang Ding, DaCheng Tao, Lingyu Duan

On the contrary, we propose a new solution: on-the-fly fine-tuning the global model in server via data-free distillation to boost its performance, dubbed FLBoost to relieve the issue of direct model aggregation.

Federated Learning

Paper
Add Code

Improving Neural Machine Translation by Bidirectional Training

no code implementations • EMNLP 2021 • Liang Ding, Di wu, DaCheng Tao

We present a simple and effective pretraining strategy -- bidirectional training (BiT) for neural machine translation.

Machine Translation Translation

Paper
Add Code

The USYD-JD Speech Translation System for IWSLT 2021

no code implementations • 24 Jul 2021 • Liang Ding, Di wu, DaCheng Tao

Our constrained system is based on a pipeline framework, i. e. ASR and NMT.

Knowledge Distillation NMT +1

Paper
Add Code

High-Dimensional Simulation Optimization via Brownian Fields and Sparse Grids

no code implementations • 19 Jul 2021 • Liang Ding, Rui Tuo, Xiaowei Zhang

High-dimensional simulation optimization is notoriously challenging.

Experimental Design Vocal Bursts Intensity Prediction

Paper
Add Code

On the Copying Behaviors of Pre-Training for Neural Machine Translation

1 code implementation • Findings (ACL) 2021 • Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Shuming Shi, Zhaopeng Tu

In response to this problem, we propose a simple and effective method named copying penalty to control the copying behaviors in decoding.

Machine Translation NMT +1

Paper
Code

Progressive Multi-Granularity Training for Non-Autoregressive Translation

no code implementations • Findings (ACL) 2021 • Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, DaCheng Tao, Zhaopeng Tu

Non-autoregressive translation (NAT) significantly accelerates the inference process via predicting the entire target sequence.

Sentence Translation

Paper
Add Code

Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in Non-Autoregressive Translation

1 code implementation • ACL 2021 • Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, DaCheng Tao, Zhaopeng Tu

Results demonstrate that the proposed approach can significantly and universally improve translation quality by reducing translation errors on low-frequency words.

Knowledge Distillation Translation

Paper
Code

Self-Guided Curriculum Learning for Neural Machine Translation

no code implementations • ACL (IWSLT) 2021 • Lei Zhou, Liang Ding, Kevin Duh, Shinji Watanabe, Ryohei Sasano, Koichi Takeda

In the field of machine learning, the well-trained model is assumed to be able to recover the training labels, i. e. the synthetic labels predicted by the model should be as close to the ground-truth labels as possible.

Machine Translation NMT +2

Paper
Add Code

Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding

no code implementations • 13 Apr 2021 • Di wu, Yiren Chen, Liang Ding, DaCheng Tao

Spoken language understanding (SLU) system usually consists of various pipeline components, where each component heavily relies on the results of its upstream ones.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

Paper
Add Code

Towards Efficiently Diversifying Dialogue Generation via Embedding Augmentation

1 code implementation • 2 Mar 2021 • Yu Cao, Liang Ding, Zhiliang Tian, Meng Fang

Dialogue generation models face the challenge of producing generic and repetitive responses.

Dialogue Generation

Paper
Code

MirrorAlign: A Super Lightweight Unsupervised Word Alignment Model via Cross-Lingual Contrastive Learning

no code implementations • IWSLT (ACL) 2022 • Di wu, Liang Ding, Shuo Yang, Mingyang Li

Recently, the performance of the neural word alignment models has exceeded that of statistical models.

Contrastive Learning Translation +1

Paper
Add Code

Unsupervised Word Alignment via Cross-Lingual Contrastive Learning

no code implementations • 1 Jan 2021 • Di wu, Liang Ding, Shuo Yang, DaCheng Tao

Recently, the performance of the neural word alignment models has exceeded that of statistical models.

Contrastive Learning Translation +1

Paper
Add Code

Understanding and Improving Lexical Choice in Non-Autoregressive Translation

no code implementations • ICLR 2021 • Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, DaCheng Tao, Zhaopeng Tu

To this end, we introduce an extra Kullback-Leibler divergence term derived by comparing the lexical choice of NAT model and that embedded in the raw data.

Knowledge Distillation Translation

Paper
Add Code

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning

1 code implementation • ICLR 2021 • Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Zhaopeng Tu

Encoder layer fusion (EncoderFusion) is a technique to fuse all the encoder layers (instead of the uppermost layer) for sequence-to-sequence (Seq2Seq) models, which has proven effective on various NLP tasks.

Decoder Grammatical Error Correction +4

Paper
Code

Context-Aware Cross-Attention for Non-Autoregressive Translation

1 code implementation • COLING 2020 • Liang Ding, Longyue Wang, Di wu, DaCheng Tao, Zhaopeng Tu

Non-autoregressive translation (NAT) significantly accelerates the inference process by predicting the entire target sequence.

Decoder Translation

Paper
Code

Sample and Computationally Efficient Stochastic Kriging in High Dimensions

no code implementations • 14 Oct 2020 • Liang Ding, Xiaowei Zhang

However, its use is limited to cases where the design space is low-dimensional because, in general, the sample complexity (i. e., the number of design points required for stochastic kriging to produce an accurate prediction) grows exponentially in the dimensionality of the design space.

Computational Efficiency Vocal Bursts Intensity Prediction

Paper
Add Code

Zero-Shot Translation Quality Estimation with Explicit Cross-Lingual Patterns

no code implementations • WMT (EMNLP) 2020 • Lei Zhou, Liang Ding, Koichi Takeda

In response to this issue, we propose to expose explicit cross-lingual patterns, \textit{e. g.} word alignments and generation score, to our proposed zero-shot models.

Sentence Translation

Paper
Add Code

SlotRefine: A Fast Non-Autoregressive Model for Joint Intent Detection and Slot Filling

1 code implementation • EMNLP 2020 • Di wu, Liang Ding, Fan Lu, Jian Xie

Slot filling and intent detection are two main tasks in spoken language understanding (SLU) system.

Intent Detection slot-filling +2

Paper
Code

High-Dimensional Non-Parametric Density Estimation in Mixed Smooth Sobolev Spaces

no code implementations • 5 Jun 2020 • Liang Ding, Lu Zou, Wenjia Wang, Shahin Shahrampour, Rui Tuo

Density estimation plays a key role in many tasks in machine learning, statistical inference, and visualization.

Density Estimation Vocal Bursts Intensity Prediction

Paper
Add Code

Self-Attention with Cross-Lingual Position Representation

no code implementations • ACL 2020 • Liang Ding, Long-Yue Wang, DaCheng Tao

Position encoding (PE), an essential part of self-attention networks (SANs), is used to preserve the word order information for natural language processing tasks, generating fixed position indices for input sequences.

Machine Translation Position +2

Paper
Add Code

Generalization Guarantees for Sparse Kernel Approximation with Entropic Optimal Features

no code implementations • ICML 2020 • Liang Ding, Rui Tuo, Shahin Shahrampour

Despite their success, kernel methods suffer from a massive computational cost in practice.

Paper
Add Code

Recurrent Graph Syntax Encoder for Neural Machine Translation

no code implementations • 19 Aug 2019 • Liang Ding, DaCheng Tao

Syntax-incorporated machine translation models have been proven successful in improving the model's reasoning and meaning preservation ability.

Machine Translation NMT +2

Paper
Add Code

The University of Sydney's Machine Translation System for WMT19

no code implementations • WS 2019 • Liang Ding, DaCheng Tao

This paper describes the University of Sydney's submission of the WMT 2019 shared news translation task.

Ranked #1 on Machine Translation on WMT 2018 Finnish-English

Data Augmentation Machine Translation +1

Paper
Add Code

Efficient Learning of Optimal Markov Network Topology with k-Tree Modeling

1 code implementation • 21 Jan 2018 • Liang Ding, Di Chang, Russell Malmberg, Aaron Martinez, David Robinson, Matthew Wicker, Hongfei Yan, Liming Cai

The seminal work of Chow and Liu (1968) shows that approximation of a finite probabilistic system by Markov trees can achieve the minimum information loss with the topology of a maximum spanning tree.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.