Search Results for author: Qihuang Zhong

Found 17 papers, 10 papers with code

ROSE Doesn't Do That: Boosting the Safety of Instruction-Tuned Large Language Models with Reverse Prompt Contrastive Decoding

no code implementations19 Feb 2024 Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao

With the development of instruction-tuned large language models (LLMs), improving the safety of LLMs has become more critical.

Revisiting Knowledge Distillation for Autoregressive Language Models

no code implementations19 Feb 2024 Qihuang Zhong, Liang Ding, Li Shen, Juhua Liu, Bo Du, DaCheng Tao

Knowledge distillation (KD) is a common approach to compress a teacher model to reduce its inference cost and memory footprint, by training a smaller student model.

Knowledge Distillation

Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models

no code implementations20 Oct 2023 Miaoxi Zhu, Qihuang Zhong, Li Shen, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao

The key algorithm in solving ZSAQ is the SAM-SGA optimization, which aims to improve the quantization accuracy and model generalization via optimizing a minimax problem.

Language Modelling Quantization

Self-Evolution Learning for Discriminative Language Model Pretraining

1 code implementation24 May 2023 Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao

Masked language modeling, widely used in discriminative language model (e. g., BERT) pretraining, commonly adopts a random masking strategy.

Language Modelling Masked Language Modeling +1

Revisiting Token Dropping Strategy in Efficient BERT Pretraining

1 code implementation24 May 2023 Qihuang Zhong, Liang Ding, Juhua Liu, Xuebo Liu, Min Zhang, Bo Du, DaCheng Tao

Token dropping is a recently-proposed strategy to speed up the pretraining of masked language models, such as BERT, by skipping the computation of a subset of the input tokens at several middle layers.

Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks

no code implementations22 May 2023 Haoqi Zheng, Qihuang Zhong, Liang Ding, Zhiliang Tian, Xin Niu, Dongsheng Li, DaCheng Tao

However, most of the mixup methods do not consider the varying degree of learning difficulty in different stages of training and generate new samples with one hot labels, resulting in the model over confidence.

Data Augmentation Few-Shot Text Classification +1

Towards Making the Most of ChatGPT for Machine Translation

1 code implementation24 Mar 2023 Keqin Peng, Liang Ding, Qihuang Zhong, Li Shen, Xuebo Liu, Min Zhang, Yuanxin Ouyang, DaCheng Tao

We show that: 1) The performance of ChatGPT depends largely on temperature, and a lower temperature usually can achieve better performance; 2) Emphasizing the task information can further improve ChatGPT's performance, particularly in complex MT tasks; 3) Introducing domain information can elicit ChatGPT's generalization ability and improve its performance in the specific domain; 4) ChatGPT tends to generate hallucinations for non-English-centric MT tasks, which can be partially addressed by our proposed prompts but still need to be highlighted for the MT/NLP community.

In-Context Learning Machine Translation +2

AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks

no code implementations1 Mar 2023 Hao Sun, Li Shen, Qihuang Zhong, Liang Ding, Shixiang Chen, Jingwei Sun, Jing Li, Guangzhong Sun, DaCheng Tao

Integrating SAM with adaptive learning rate and momentum acceleration, dubbed AdaSAM, has already been explored empirically to train large-scale deep neural networks without theoretical guarantee due to the triple difficulties in analyzing the coupled perturbation step, adaptive learning rate and momentum step.

Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT

1 code implementation19 Feb 2023 Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao

Recently, ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries.

Question Answering Sentiment Analysis

Bag of Tricks for Effective Language Model Pretraining and Downstream Adaptation: A Case Study on GLUE

no code implementations18 Feb 2023 Qihuang Zhong, Liang Ding, Keqin Peng, Juhua Liu, Bo Du, Li Shen, Yibing Zhan, DaCheng Tao

This technical report briefly describes our JDExplore d-team's submission Vega v1 on the General Language Understanding Evaluation (GLUE) leaderboard, where GLUE is a collection of nine natural language understanding tasks, including question answering, linguistic acceptability, sentiment analysis, text similarity, paraphrase detection, and natural language inference.

Contrastive Learning Denoising +12

Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models

1 code implementation11 Oct 2022 Qihuang Zhong, Liang Ding, Li Shen, Peng Mi, Juhua Liu, Bo Du, DaCheng Tao

Fine-tuning large pretrained language models on a limited training corpus usually suffers from poor generalization.

PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation

1 code implementation22 Aug 2022 Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao

Prompt Transfer (PoT) is a recently-proposed approach to improve prompt-tuning, by initializing the target prompt with the existing prompt trained on similar source tasks.

General Knowledge Knowledge Distillation +1

E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation

1 code implementation30 May 2022 Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao

To verify our hypothesis, we first empirically study the functionalities of the encoder and decoder in seq2seq pretrained language models, and find that the encoder takes an important but under-exploitation role than the decoder regarding the downstream performance and neuron activation.

Denoising Language Modelling +2

A Contrastive Cross-Channel Data Augmentation Framework for Aspect-based Sentiment Analysis

1 code implementation COLING 2022 Bing Wang, Liang Ding, Qihuang Zhong, Ximing Li, DaCheng Tao

Aspect-based sentiment analysis (ABSA) is a fine-grained sentiment analysis task, which focuses on detecting the sentiment polarity towards the aspect in a sentence.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +4

Knowledge Graph Augmented Network Towards Multiview Representation Learning for Aspect-based Sentiment Analysis

1 code implementation13 Jan 2022 Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Hua Jin, DaCheng Tao

To this end, we propose a knowledge graph augmented network KGAN, which aims to effectively incorporate external knowledge with explicitly syntactic and contextual information.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2

Unified Instance and Knowledge Alignment Pretraining for Aspect-based Sentiment Analysis

1 code implementation26 Oct 2021 Juhua Liu, Qihuang Zhong, Liang Ding, Hua Jin, Bo Du, DaCheng Tao

In practice, we formulate the model pretrained on the sampled instances into a knowledge guidance model and a learner model, respectively.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2

Cannot find the paper you are looking for? You can Submit a new open access paper.