Search Results for author: Liang Ding

Found 100 papers, 39 papers with code

Efficient Learning of Optimal Markov Network Topology with k-Tree Modeling

1 code implementation21 Jan 2018 Liang Ding, Di Chang, Russell Malmberg, Aaron Martinez, David Robinson, Matthew Wicker, Hongfei Yan, Liming Cai

The seminal work of Chow and Liu (1968) shows that approximation of a finite probabilistic system by Markov trees can achieve the minimum information loss with the topology of a maximum spanning tree.

Recurrent Graph Syntax Encoder for Neural Machine Translation

no code implementations19 Aug 2019 Liang Ding, DaCheng Tao

Syntax-incorporated machine translation models have been proven successful in improving the model's reasoning and meaning preservation ability.

Machine Translation NMT +2

Self-Attention with Cross-Lingual Position Representation

no code implementations ACL 2020 Liang Ding, Long-Yue Wang, DaCheng Tao

Position encoding (PE), an essential part of self-attention networks (SANs), is used to preserve the word order information for natural language processing tasks, generating fixed position indices for input sequences.

Machine Translation Position +2

Zero-Shot Translation Quality Estimation with Explicit Cross-Lingual Patterns

no code implementations WMT (EMNLP) 2020 Lei Zhou, Liang Ding, Koichi Takeda

In response to this issue, we propose to expose explicit cross-lingual patterns, \textit{e. g.} word alignments and generation score, to our proposed zero-shot models.

Sentence Translation

Sample and Computationally Efficient Stochastic Kriging in High Dimensions

no code implementations14 Oct 2020 Liang Ding, Xiaowei Zhang

However, its use is limited to cases where the design space is low-dimensional because, in general, the sample complexity (i. e., the number of design points required for stochastic kriging to produce an accurate prediction) grows exponentially in the dimensionality of the design space.

Computational Efficiency Vocal Bursts Intensity Prediction

Context-Aware Cross-Attention for Non-Autoregressive Translation

1 code implementation COLING 2020 Liang Ding, Longyue Wang, Di wu, DaCheng Tao, Zhaopeng Tu

Non-autoregressive translation (NAT) significantly accelerates the inference process by predicting the entire target sequence.

Translation

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning

1 code implementation ICLR 2021 Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Zhaopeng Tu

Encoder layer fusion (EncoderFusion) is a technique to fuse all the encoder layers (instead of the uppermost layer) for sequence-to-sequence (Seq2Seq) models, which has proven effective on various NLP tasks.

Grammatical Error Correction Machine Translation +3

Understanding and Improving Lexical Choice in Non-Autoregressive Translation

no code implementations ICLR 2021 Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, DaCheng Tao, Zhaopeng Tu

To this end, we introduce an extra Kullback-Leibler divergence term derived by comparing the lexical choice of NAT model and that embedded in the raw data.

Knowledge Distillation Translation

Unsupervised Word Alignment via Cross-Lingual Contrastive Learning

no code implementations1 Jan 2021 Di wu, Liang Ding, Shuo Yang, DaCheng Tao

Recently, the performance of the neural word alignment models has exceeded that of statistical models.

Contrastive Learning Translation +1

Towards Efficiently Diversifying Dialogue Generation via Embedding Augmentation

1 code implementation2 Mar 2021 Yu Cao, Liang Ding, Zhiliang Tian, Meng Fang

Dialogue generation models face the challenge of producing generic and repetitive responses.

Dialogue Generation

Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding

no code implementations13 Apr 2021 Di wu, Yiren Chen, Liang Ding, DaCheng Tao

Spoken language understanding (SLU) system usually consists of various pipeline components, where each component heavily relies on the results of its upstream ones.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

Self-Guided Curriculum Learning for Neural Machine Translation

no code implementations ACL (IWSLT) 2021 Lei Zhou, Liang Ding, Kevin Duh, Shinji Watanabe, Ryohei Sasano, Koichi Takeda

In the field of machine learning, the well-trained model is assumed to be able to recover the training labels, i. e. the synthetic labels predicted by the model should be as close to the ground-truth labels as possible.

Machine Translation NMT +2

Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in Non-Autoregressive Translation

1 code implementation ACL 2021 Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, DaCheng Tao, Zhaopeng Tu

Results demonstrate that the proposed approach can significantly and universally improve translation quality by reducing translation errors on low-frequency words.

Knowledge Distillation Translation

Improving Neural Machine Translation by Bidirectional Training

no code implementations EMNLP 2021 Liang Ding, Di wu, DaCheng Tao

We present a simple and effective pretraining strategy -- bidirectional training (BiT) for neural machine translation.

Machine Translation Translation

FLBoost: On-the-Fly Fine-tuning Boosts Federated Learning via Data-free Distillation

no code implementations29 Sep 2021 Lin Zhang, Li Shen, Liang Ding, DaCheng Tao, Lingyu Duan

On the contrary, we propose a new solution: on-the-fly fine-tuning the global model in server via data-free distillation to boost its performance, dubbed FLBoost to relieve the issue of direct model aggregation.

Federated Learning

On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation

1 code implementation Findings (EMNLP) 2021 Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Shuming Shi, Zhaopeng Tu

Pre-training (PT) and back-translation (BT) are two simple and powerful methods to utilize monolingual data for improving the model performance of neural machine translation (NMT).

Machine Translation NMT +2

Unified Instance and Knowledge Alignment Pretraining for Aspect-based Sentiment Analysis

1 code implementation26 Oct 2021 Juhua Liu, Qihuang Zhong, Liang Ding, Hua Jin, Bo Du, DaCheng Tao

In practice, we formulate the model pretrained on the sampled instances into a knowledge guidance model and a learner model, respectively.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2

A Sparse Expansion For Deep Gaussian Processes

no code implementations11 Dec 2021 Liang Ding, Rui Tuo, Shahin Shahrampour

In this work, we use Deep Gaussian Processes (DGPs) as statistical surrogates for stochastic processes with complex distributions.

Computational Efficiency Gaussian Processes

Knowledge Graph Augmented Network Towards Multiview Representation Learning for Aspect-based Sentiment Analysis

1 code implementation13 Jan 2022 Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Hua Jin, DaCheng Tao

To this end, we propose a knowledge graph augmented network KGAN, which aims to effectively incorporate external knowledge with explicitly syntactic and contextual information.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2

Improving Neural Machine Translation by Denoising Training

no code implementations19 Jan 2022 Liang Ding, Keqin Peng, DaCheng Tao

We present a simple and effective pretraining strategy {D}en{o}ising {T}raining DoT for neural machine translation.

Denoising Knowledge Distillation +2

Kernel Packet: An Exact and Scalable Algorithm for Gaussian Process Regression with Matérn Correlations

no code implementations7 Mar 2022 HaoYuan Chen, Liang Ding, Rui Tuo

We develop an exact and scalable algorithm for one-dimensional Gaussian process regression with Mat\'ern correlations whose smoothness parameter $\nu$ is a half-integer.

regression

Where Does the Performance Improvement Come From? -- A Reproducibility Concern about Image-Text Retrieval

1 code implementation8 Mar 2022 Jun Rao, Fei Wang, Liang Ding, Shuhan Qi, Yibing Zhan, Weifeng Liu, DaCheng Tao

In contrast to previous works, we focus on the reproducibility of the approaches and the examination of the elements that lead to improved performance by pretrained and nonpretrained models in retrieving images and text.

Information Retrieval Retrieval +1

Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning

1 code implementation CVPR 2022 Lin Zhang, Li Shen, Liang Ding, DaCheng Tao, Ling-Yu Duan

Instead, we propose a data-free knowledge distillation method to fine-tune the global model in the server (FedFTG), which relieves the issue of direct model aggregation.

Data-free Knowledge Distillation Federated Learning

SD-Conv: Towards the Parameter-Efficiency of Dynamic Convolution

no code implementations5 Apr 2022 Shwai He, Chenbo Jiang, Daize Dong, Liang Ding

Dynamic convolution achieves better performance for efficient CNNs at the cost of negligible FLOPs increase.

Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation and Understanding

1 code implementation16 Apr 2022 Changtong Zan, Liang Ding, Li Shen, Yu Cao, Weifeng Liu, DaCheng Tao

For multilingual sequence-to-sequence pretrained language models (multilingual Seq2Seq PLMs), e. g. mBART, the self-supervised pretraining task is trained on a wide range of monolingual languages, e. g. 25 languages from CommonCrawl, while the downstream cross-lingual tasks generally progress on a bilingual language subset, e. g. English-German, making there exists the data discrepancy, namely domain discrepancy, and cross-lingual learning objective discrepancy, namely task discrepancy, between the pretraining and finetuning stages.

Cross-Lingual Natural Language Inference nlg evaluation +4

BLISS: Robust Sequence-to-Sequence Learning via Self-Supervised Input Representation

no code implementations16 Apr 2022 Zheng Zhang, Liang Ding, Dazhao Cheng, Xuebo Liu, Min Zhang, DaCheng Tao

Data augmentations (DA) are the cores to achieving robust sequence-to-sequence learning on various natural language processing (NLP) tasks.

Grammatical Error Correction Machine Translation +1

A Contrastive Cross-Channel Data Augmentation Framework for Aspect-based Sentiment Analysis

1 code implementation COLING 2022 Bing Wang, Liang Ding, Qihuang Zhong, Ximing Li, DaCheng Tao

Aspect-based sentiment analysis (ABSA) is a fine-grained sentiment analysis task, which focuses on detecting the sentiment polarity towards the aspect in a sentence.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +4

Interpretable Proof Generation via Iterative Backward Reasoning

1 code implementation NAACL 2022 Hanhao Qu, Yu Cao, Jun Gao, Liang Ding, Ruifeng Xu

We present IBR, an Iterative Backward Reasoning model to solve the proof generation tasks on rule-based Question Answering (QA), where models are required to reason over a series of textual rules and facts to find out the related proof path and derive the final answer.

Question Answering

Parameter-Efficient and Student-Friendly Knowledge Distillation

no code implementations28 May 2022 Jun Rao, Xv Meng, Liang Ding, Shuhan Qi, DaCheng Tao

In this paper, we present a parameter-efficient and student-friendly knowledge distillation method, namely PESF-KD, to achieve efficient and sufficient knowledge transfer by updating relatively few partial parameters.

Knowledge Distillation Transfer Learning

E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation

1 code implementation30 May 2022 Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao

To verify our hypothesis, we first empirically study the functionalities of the encoder and decoder in seq2seq pretrained language models, and find that the encoder takes an important but under-exploitation role than the decoder regarding the downstream performance and neuron activation.

Denoising Language Modelling +2

Dynamic Contrastive Distillation for Image-Text Retrieval

no code implementations4 Jul 2022 Jun Rao, Liang Ding, Shuhan Qi, Meng Fang, Yang Liu, Li Shen, DaCheng Tao

Although the vision-and-language pretraining (VLP) equipped cross-modal image-text retrieval (ITR) has achieved remarkable progress in the past two years, it suffers from a major drawback: the ever-increasing size of VLP models restricts its deployment to real-world search scenarios (where the high latency is unacceptable).

Contrastive Learning Metric Learning +3

Comprehensive Graph Gradual Pruning for Sparse Training in Graph Neural Networks

no code implementations18 Jul 2022 Chuang Liu, Xueqi Ma, Yibing Zhan, Liang Ding, Dapeng Tao, Bo Du, Wenbin Hu, Danilo Mandic

However, the LTH-based methods suffer from two major drawbacks: 1) they require exhaustive and iterative training of dense models, resulting in an extremely large training computation cost, and 2) they only trim graph structures and model parameters but ignore the node feature dimension, where significant redundancy exists.

Node Classification

PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation

no code implementations22 Aug 2022 Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao

In response to these problems, we propose a new metric to accurately predict the prompt transferability (regarding (i)), and a novel PoT approach (namely PANDA) that leverages the knowledge distillation technique to transfer the "knowledge" from the source prompt to the target prompt in a subtle manner and alleviate the catastrophic forgetting effectively (regarding (ii)).

Knowledge Distillation Transfer Learning

Vega-MT: The JD Explore Academy Translation System for WMT22

1 code implementation20 Sep 2022 Changtong Zan, Keqin Peng, Liang Ding, Baopu Qiu, Boan Liu, Shwai He, Qingyu Lu, Zheng Zhang, Chuang Liu, Weifeng Liu, Yibing Zhan, DaCheng Tao

As for model sizes, we scale the Transformer-Big up to the extremely large model that owns nearly 4. 7 Billion parameters, to fully enhance the model capacity for our Vega-MT.

Data Augmentation Machine Translation +1

SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters

1 code implementation9 Oct 2022 Shwai He, Liang Ding, Daize Dong, Miao Zhang, DaCheng Tao

Adapter Tuning, which freezes the pretrained language models (PLMs) and only fine-tunes a few extra modules, becomes an appealing efficient alternative to the full model fine-tuning.

Network Pruning

Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models

1 code implementation11 Oct 2022 Qihuang Zhong, Liang Ding, Li Shen, Peng Mi, Juhua Liu, Bo Du, DaCheng Tao

Fine-tuning large pretrained language models on a limited training corpus usually suffers from poor generalization.

PAD-Net: An Efficient Framework for Dynamic Networks

1 code implementation10 Nov 2022 Shwai He, Liang Ding, Daize Dong, Boan Liu, Fuqiang Yu, DaCheng Tao

The main contributions of our work are challenging the basic commonsense in dynamic networks and proposing a partially dynamic network, namely PAD-Net, to transform the redundant dynamic parameters into static ones.

Image Classification

Improving Simultaneous Machine Translation with Monolingual Data

1 code implementation2 Dec 2022 Hexuan Deng, Liang Ding, Xuebo Liu, Meishan Zhang, DaCheng Tao, Min Zhang

Preliminary experiments on En-Zh and En-Ja news domain corpora demonstrate that monolingual data can significantly improve translation quality (e. g., +3. 15 BLEU on En-Zh).

Hallucination Knowledge Distillation +4

Original or Translated? On the Use of Parallel Data for Translation Quality Estimation

no code implementations20 Dec 2022 Baopu Qiu, Liang Ding, Di wu, Lin Shang, Yibing Zhan, DaCheng Tao

Machine Translation Quality Estimation (QE) is the task of evaluating translation output in the absence of human-written references.

Data Augmentation Machine Translation +2

Toward Human-Like Evaluation for Natural Language Generation with Error Analysis

1 code implementation20 Dec 2022 Qingyu Lu, Liang Ding, Liping Xie, Kanjian Zhang, Derek F. Wong, DaCheng Tao

To this end, we augment BARTScore by incorporating the human-like error analysis strategies, namely BARTScore++, where the final score consists of both the evaluations of major errors and minor errors.

Language Modelling Machine Translation +2

Bag of Tricks for Effective Language Model Pretraining and Downstream Adaptation: A Case Study on GLUE

no code implementations18 Feb 2023 Qihuang Zhong, Liang Ding, Keqin Peng, Juhua Liu, Bo Du, Li Shen, Yibing Zhan, DaCheng Tao

This technical report briefly describes our JDExplore d-team's submission Vega v1 on the General Language Understanding Evaluation (GLUE) leaderboard, where GLUE is a collection of nine natural language understanding tasks, including question answering, linguistic acceptability, sentiment analysis, text similarity, paraphrase detection, and natural language inference.

Contrastive Learning Denoising +12

Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT

1 code implementation19 Feb 2023 Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao

Recently, ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries.

Question Answering Sentiment Analysis

FedSpeed: Larger Local Interval, Less Communication Round, and Higher Generalization Accuracy

1 code implementation21 Feb 2023 Yan Sun, Li Shen, Tiansheng Huang, Liang Ding, DaCheng Tao

Federated learning is an emerging distributed machine learning framework which jointly trains a global model via a large number of local devices with data privacy protections.

Federated Learning

AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks

no code implementations1 Mar 2023 Hao Sun, Li Shen, Qihuang Zhong, Liang Ding, Shixiang Chen, Jingwei Sun, Jing Li, Guangzhong Sun, DaCheng Tao

Integrating SAM with adaptive learning rate and momentum acceleration, dubbed AdaSAM, has already been explored empirically to train large-scale deep neural networks without theoretical guarantee due to the triple difficulties in analyzing the coupled perturbation step, adaptive learning rate and momentum step.

Towards Making the Most of ChatGPT for Machine Translation

1 code implementation24 Mar 2023 Keqin Peng, Liang Ding, Qihuang Zhong, Li Shen, Xuebo Liu, Min Zhang, Yuanxin Ouyang, DaCheng Tao

We show that: 1) The performance of ChatGPT depends largely on temperature, and a lower temperature usually can achieve better performance; 2) Emphasizing the task information can further improve ChatGPT's performance, particularly in complex MT tasks; 3) Introducing domain information can elicit ChatGPT's generalization ability and improve its performance in the specific domain; 4) ChatGPT tends to generate hallucinations for non-English-centric MT tasks, which can be partially addressed by our proposed prompts but still need to be highlighted for the MT/NLP community.

In-Context Learning Machine Translation +2

Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models

1 code implementation24 Mar 2023 Qingyu Lu, Baopu Qiu, Liang Ding, Kanjian Zhang, Tom Kocmi, DaCheng Tao

To further improve the performance of LLMs on MT quality assessment, we investigate several prompting designs, and propose a new prompting method called \textbf{\texttt{Error Analysis Prompting}} (EAPrompt) by combining Chain-of-Thoughts (Wei et al., 2022) and Error Analysis (Lu et al., 2023).

Machine Translation Natural Language Understanding +3

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

no code implementations7 Apr 2023 Li Shen, Yan Sun, Zhiyuan Yu, Liang Ding, Xinmei Tian, DaCheng Tao

The field of deep learning has witnessed significant progress, particularly in computer vision (CV), natural language processing (NLP), and speech.

Prompt-Learning for Cross-Lingual Relation Extraction

1 code implementation20 Apr 2023 Chiaming Hsu, Changtong Zan, Liang Ding, Longyue Wang, Xiaoting Wang, Weifeng Liu, Fu Lin, Wenbin Hu

Experiments on WMT17-EnZh XRE also show the effectiveness of our Prompt-XRE against other competitive baselines.

Relation Relation Extraction +1

Representing Additive Gaussian Processes by Sparse Matrices

no code implementations29 Apr 2023 Lu Zou, HaoYuan Chen, Liang Ding

We show how to use these sparse formulas to generalize back-fitting-based algorithms to efficiently compute the posterior mean, posterior variance, log-likelihood, and gradient of these three functions for additive GPs, all in $O(n \log n)$ time.

Additive models Bayesian Optimization +1

Random Smoothing Regularization in Kernel Gradient Descent Learning

no code implementations5 May 2023 Liang Ding, Tianyang Hu, Jiahang Jiang, Donghao Li, Wenjia Wang, Yuan YAO

In this paper, we aim to bridge this gap by presenting a framework for random smoothing regularization that can adaptively and effectively learn a wide range of ground truth functions belonging to the classical Sobolev spaces.

Data Augmentation

Dynamic Regularized Sharpness Aware Minimization in Federated Learning: Approaching Global Consistency and Smooth Landscape

no code implementations19 May 2023 Yan Sun, Li Shen, Shixiang Chen, Liang Ding, DaCheng Tao

In federated learning (FL), a cluster of local clients are chaired under the coordination of the global server and cooperatively train one model with privacy protection.

Federated Learning

Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks

no code implementations22 May 2023 Haoqi Zheng, Qihuang Zhong, Liang Ding, Zhiliang Tian, Xin Niu, Dongsheng Li, DaCheng Tao

However, most of the mixup methods do not consider the varying degree of learning difficulty in different stages of training and generate new samples with one hot labels, resulting in the model over confidence.

Data Augmentation Few-Shot Text Classification +1

Self-Evolution Learning for Discriminative Language Model Pretraining

1 code implementation24 May 2023 Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao

Masked language modeling, widely used in discriminative language model (e. g., BERT) pretraining, commonly adopts a random masking strategy.

Language Modelling Masked Language Modeling +1

Revisiting Token Dropping Strategy in Efficient BERT Pretraining

1 code implementation24 May 2023 Qihuang Zhong, Liang Ding, Juhua Liu, Xuebo Liu, Min Zhang, Bo Du, DaCheng Tao

Token dropping is a recently-proposed strategy to speed up the pretraining of masked language models, such as BERT, by skipping the computation of a subset of the input tokens at several middle layers.

Divide, Conquer, and Combine: Mixture of Semantic-Independent Experts for Zero-Shot Dialogue State Tracking

no code implementations1 Jun 2023 Qingyue Wang, Liang Ding, Yanan Cao, Yibing Zhan, Zheng Lin, Shi Wang, DaCheng Tao, Li Guo

Zero-shot transfer learning for Dialogue State Tracking (DST) helps to handle a variety of task-oriented dialogue domains without the cost of collecting in-domain data.

Dialogue State Tracking Transfer Learning

Unsupervised Dense Retrieval with Relevance-Aware Contrastive Pre-Training

1 code implementation5 Jun 2023 Yibin Lei, Liang Ding, Yu Cao, Changtong Zan, Andrew Yates, DaCheng Tao

Dense retrievers have achieved impressive performance, but their demand for abundant training data limits their application scenarios.

Contrastive Learning Retrieval

Free-Form Composition Networks for Egocentric Action Recognition

no code implementations13 Jul 2023 Haoran Wang, Qinghua Cheng, Baosheng Yu, Yibing Zhan, Dapeng Tao, Liang Ding, Haibin Ling

We evaluated our method on three popular egocentric action recognition datasets, Something-Something V2, H2O, and EPIC-KITCHENS-100, and the experimental results demonstrate the effectiveness of the proposed method for handling data scarcity problems, including long-tailed and few-shot egocentric action recognition.

Action Recognition Temporal Action Localization

Efficient Federated Learning via Local Adaptive Amended Optimizer with Linear Speedup

no code implementations30 Jul 2023 Yan Sun, Li Shen, Hao Sun, Liang Ding, DaCheng Tao

Adaptive optimization has achieved notable success for distributed learning while extending adaptive optimizer to federated Learning (FL) suffers from severe inefficiency, including (i) rugged convergence due to inaccurate gradient estimation in global adaptive optimizer; (ii) client drifts exacerbated by local over-fitting with the local adaptive optimizer.

Federated Learning

Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining?

1 code implementation24 Aug 2023 Fei Wang, Liang Ding, Jun Rao, Ye Liu, Li Shen, Changxing Ding

The multimedia community has shown a significant interest in perceiving and representing the physical world with multimodal pretrained neural network models, and among them, the visual-language pertaining (VLP) is, currently, the most captivating topic.

Attribute Negation +1

Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models

no code implementations29 Aug 2023 Qingyue Wang, Liang Ding, Yanan Cao, Zhiliang Tian, Shi Wang, DaCheng Tao, Li Guo

We evaluate our method on both open and closed LLMs, and the experiments on the widely-used public dataset show that our method can generate more consistent responses in a long-context conversation.

Chatbot

MerA: Merging Pretrained Adapters For Few-Shot Learning

no code implementations30 Aug 2023 Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, DaCheng Tao

Adapter tuning, which updates only a few parameters, has become a mainstream method for fine-tuning pretrained language models to downstream tasks.

Few-Shot Learning MRPC

Deep Model Fusion: A Survey

no code implementations27 Sep 2023 Weishi Li, Yong Peng, Miao Zhang, Liang Ding, Han Hu, Li Shen

Specifically, we categorize existing deep model fusion methods as four-fold: (1) "Mode connectivity", which connects the solutions in weight space via a path of non-increasing loss, in order to obtain better initialization for model fusion; (2) "Alignment" matches units between neural networks to create better conditions for fusion; (3) "Weight average", a classical model fusion method, averages the weights of multiple models to obtain more accurate results closer to the optimal solution; (4) "Ensemble learning" combines the outputs of diverse models, which is a foundational technique for improving the accuracy and robustness of the final model.

Ensemble Learning

Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot Translation

1 code implementation28 Sep 2023 Changtong Zan, Liang Ding, Li Shen, Yibin Lei, Yibing Zhan, Weifeng Liu, DaCheng Tao

Zero-shot translation (ZST), which is generally based on a multilingual neural machine translation model, aims to translate between unseen language pairs in training data.

Machine Translation Navigate +2

Merging Experts into One: Improving Computational Efficiency of Mixture of Experts

1 code implementation15 Oct 2023 Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, DaCheng Tao

Although a sparse Mixture of Experts (MoE) can reduce the cost by activating a small subset of parameters (e. g., one expert) for each input, its computation escalates significantly if increasing the number of activated experts, limiting its practical utility.

Computational Efficiency

Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer

no code implementations15 Oct 2023 Boan Liu, Liang Ding, Li Shen, Keqin Peng, Yu Cao, Dazhao Cheng, DaCheng Tao

The Mixture of Experts (MoE) has emerged as a highly successful technique in deep learning, based on the principle of divide-and-conquer to maximize model capacity without significant additional computational cost.

Question Answering

Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models

no code implementations20 Oct 2023 Miaoxi Zhu, Qihuang Zhong, Li Shen, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao

The key algorithm in solving ZSAQ is the SAM-SGA optimization, which aims to improve the quantization accuracy and model generalization via optimizing a minimax problem.

Language Modelling Quantization

SpliceMix: A Cross-scale and Semantic Blending Augmentation Strategy for Multi-label Image Classification

1 code implementation26 Nov 2023 Lei Wang, Yibing Zhan, Leilei Ma, Dapeng Tao, Liang Ding, Chen Gong

The "splice" in our method is two-fold: 1) Each mixed image is a splice of several downsampled images in the form of a grid, where the semantics of images attending to mixing are blended without object deficiencies for alleviating co-occurred bias; 2) We splice mixed images and the original mini-batch to form a new SpliceMixed mini-batch, which allows an image with different scales to contribute to training together.

Data Augmentation Multi-Label Image Classification

Exploring Sparsity in Graph Transformers

no code implementations9 Dec 2023 Chuang Liu, Yibing Zhan, Xueqi Ma, Liang Ding, Dapeng Tao, Jia Wu, Wenbin Hu, Bo Du

Graph Transformers (GTs) have achieved impressive results on various graph-related tasks.

Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion

1 code implementation11 Dec 2023 Anke Tang, Li Shen, Yong Luo, Liang Ding, Han Hu, Bo Du, DaCheng Tao

At the upper level, we focus on learning a shared Concrete mask to identify the subspace, while at the inner level, model merging is performed to maximize the performance of the merged model.

Meta-Learning

POMP: Probability-driven Meta-graph Prompter for LLMs in Low-resource Unsupervised Neural Machine Translation

no code implementations11 Jan 2024 Shilong Pan, Zhiliang Tian, Liang Ding, Zhen Huang, Zhihua Wen, Dongsheng Li

POMP involves constructing a directed acyclic meta-graph for each source language, from which we dynamically sample multiple paths to prompt LLMs to mitigate the linguistic noise and improve translations during training.

In-Context Learning Machine Translation +3

OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models

1 code implementation12 Jan 2024 Shuai Wang, Liang Ding, Li Shen, Yong Luo, Bo Du, DaCheng Tao

Advancing automated programming necessitates robust and comprehensive code generation benchmarks, yet current evaluation frameworks largely neglect object-oriented programming (OOP) in favor of functional programming (FP), e. g., HumanEval and MBPP.

Code Generation

Intention Analysis Makes LLMs A Good Jailbreak Defender

no code implementations12 Jan 2024 Yuqi Zhang, Liang Ding, Lefei Zhang, DaCheng Tao

Aligning large language models (LLMs) with human values, particularly in the face of stealthy and complex jailbreak attacks, presents a formidable challenge.

Revisiting Demonstration Selection Strategies in In-Context Learning

no code implementations22 Jan 2024 Keqin Peng, Liang Ding, Yancheng Yuan, Xuebo Liu, Min Zhang, Yuanxin Ouyang, DaCheng Tao

In this work, we first revisit the factors contributing to this variance from both data and model aspects, and find that the choice of demonstration is both data- and model-dependent.

In-Context Learning

A General Theory for Kernel Packets: from state space model to compactly supported basis

no code implementations6 Feb 2024 Liang Ding, Rui Tuo

We prove that an $m$-dimensional SS model formulation of GP is equivalent to a concept we introduce as the general right Kernel Packet (KP): a transformation for the GP covariance function $K$ such that $\sum_{i=0}^{m}a_iD_t^{(j)}K(t, t_i)=0$ holds for any $t \leq t_1$, 0 $\leq j \leq m-1$, and $m+1$ consecutive points $t_i$, where ${D}_t^{(j)}f(t) $ denotes $j$-th order derivative acting on $t$.

Mitigating Reward Hacking via Information-Theoretic Reward Modeling

no code implementations14 Feb 2024 Yuchun Miao, Sen Zhang, Liang Ding, Rong Bao, Lefei Zhang, DaCheng Tao

Inspired by this finding, we propose the Integrated Cluster Deviation Score (ICDS), which quantifies deviations in the latent space, as an indicator of reward overoptimization to facilitate the development of online mitigation strategies.

DB-LLM: Accurate Dual-Binarization for Efficient LLMs

no code implementations19 Feb 2024 Hong Chen, Chengtao Lv, Liang Ding, Haotong Qin, Xiabin Zhou, Yifu Ding, Xuebo Liu, Min Zhang, Jinyang Guo, Xianglong Liu, DaCheng Tao

Large language models (LLMs) have significantly advanced the field of natural language processing, while the expensive memory and computation consumption impede their practical deployment.

Binarization Computational Efficiency +1

ROSE Doesn't Do That: Boosting the Safety of Instruction-Tuned Large Language Models with Reverse Prompt Contrastive Decoding

no code implementations19 Feb 2024 Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao

With the development of instruction-tuned large language models (LLMs), improving the safety of LLMs has become more critical.

Revisiting Knowledge Distillation for Autoregressive Language Models

no code implementations19 Feb 2024 Qihuang Zhong, Liang Ding, Li Shen, Juhua Liu, Bo Du, DaCheng Tao

Knowledge distillation (KD) is a common approach to compress a teacher model to reduce its inference cost and memory footprint, by training a smaller student model.

Knowledge Distillation

Healthcare Copilot: Eliciting the Power of General LLMs for Medical Consultation

no code implementations20 Feb 2024 Zhiyao Ren, Yibing Zhan, Baosheng Yu, Liang Ding, DaCheng Tao

The copilot framework, which aims to enhance and tailor large language models (LLMs) for specific complex tasks without requiring fine-tuning, is gaining increasing attention from the community.

Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction

1 code implementation15 Mar 2024 Ziyang Xu, Keqin Peng, Liang Ding, DaCheng Tao, Xiliang Lu

Experiments across various prompts, PLMs, and benchmarks show that our approach can not only correct the overfitted performance caused by prompt bias, but also significantly improve the prompt retrieval capability (up to 10% absolute performance gain).

Kernel Multigrid: Accelerate Back-fitting via Sparse Gaussian Process Regression

no code implementations20 Mar 2024 Lu Zou, Liang Ding

By utilizing a technique called Kernel Packets (KP), we prove that the convergence rate of Back-fitting is no faster than $(1-\mathcal{O}(\frac{1}{n}))^t$, where $n$ and $t$ denote the data size and the iteration number, respectively.

feature selection GPR +1

Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning

no code implementations21 Mar 2024 Changtong Zan, Liang Ding, Li Shen, Yibing Zhen, Weifeng Liu, DaCheng Tao

In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs.

In-Context Learning Instruction Following +1

Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding

no code implementations27 Mar 2024 Xintong Wang, Jingheng Pan, Liang Ding, Chris Biemann

Our method is inspired by our observation that what we call disturbance instructions significantly exacerbate hallucinations in multimodal fusion modules.

Attribute Decision Making

Redistributing Low-Frequency Words: Making the Most of Monolingual Data in Non-Autoregressive Translation

1 code implementation ACL 2022 Liang Ding, Longyue Wang, Shuming Shi, DaCheng Tao, Zhaopeng Tu

In this work, we provide an appealing alternative for NAT – monolingual KD, which trains NAT student on external monolingual data with AT teacher trained on the original bilingual data.

Knowledge Distillation Translation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.