no code implementations • 24 May 2023 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao
Masked language modeling, widely used in discriminative language model (e. g., BERT) pretraining, commonly adopts a random masking strategy.
no code implementations • 24 May 2023 • Qihuang Zhong, Liang Ding, Juhua Liu, Xuebo Liu, Min Zhang, Bo Du, DaCheng Tao
Token dropping is a recently-proposed strategy to speed up the pretraining of masked language models, such as BERT, by skipping the computation of a subset of the input tokens at several middle layers.
no code implementations • 22 May 2023 • Haoqi Zheng, Qihuang Zhong, Liang Ding, Zhiliang Tian, Xin Niu, Dongsheng Li, DaCheng Tao
However, most of the mixup methods do not consider the varying degree of learning difficulty in different stages of training and generate new samples with one hot labels, resulting in the model over confidence.
1 code implementation • 24 Mar 2023 • Keqin Peng, Liang Ding, Qihuang Zhong, Li Shen, Xuebo Liu, Min Zhang, Yuanxin Ouyang, DaCheng Tao
We show that: 1) The performance of ChatGPT depends largely on temperature, and a lower temperature usually can achieve better performance; 2) Emphasizing the task information further improves ChatGPT's performance, particularly in complex MT tasks; 3) Introducing domain information can elicit ChatGPT's generalization ability and improve its performance in the specific domain; 4) ChatGPT tends to generate hallucinations for non-English-centric MT tasks, which can be partially addressed by our proposed prompts but still need to be highlighted for the MT/NLP community.
no code implementations • 1 Mar 2023 • Hao Sun, Li Shen, Qihuang Zhong, Liang Ding, Shixiang Chen, Jingwei Sun, Jing Li, Guangzhong Sun, DaCheng Tao
Integrating SAM with adaptive learning rate and momentum acceleration, dubbed AdaSAM, has already been explored empirically to train large-scale deep neural networks without theoretical guarantee due to the triple difficulties in analyzing the coupled perturbation step, adaptive learning rate and momentum step.
1 code implementation • 19 Feb 2023 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao
Recently, ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries.
no code implementations • 18 Feb 2023 • Qihuang Zhong, Liang Ding, Keqin Peng, Juhua Liu, Bo Du, Li Shen, Yibing Zhan, DaCheng Tao
This technical report briefly describes our JDExplore d-team's submission Vega v1 on the General Language Understanding Evaluation (GLUE) leaderboard, where GLUE is a collection of nine natural language understanding tasks, including question answering, linguistic acceptability, sentiment analysis, text similarity, paraphrase detection, and natural language inference.
no code implementations • 4 Dec 2022 • Qihuang Zhong, Liang Ding, Yibing Zhan, Yu Qiao, Yonggang Wen, Li Shen, Juhua Liu, Baosheng Yu, Bo Du, Yixin Chen, Xinbo Gao, Chunyan Miao, Xiaoou Tang, DaCheng Tao
This technical report briefly describes our JDExplore d-team's Vega v2 submission on the SuperGLUE leaderboard.
1 code implementation • 11 Oct 2022 • Qihuang Zhong, Liang Ding, Li Shen, Peng Mi, Juhua Liu, Bo Du, DaCheng Tao
Fine-tuning large pretrained language models on a limited training corpus usually suffers from poor generalization.
no code implementations • 22 Aug 2022 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao
In response to these problems, we propose a new metric to accurately predict the prompt transferability (regarding (i)), and a novel PoT approach (namely PANDA) that leverages the knowledge distillation technique to transfer the "knowledge" from the source prompt to the target prompt in a subtle manner and alleviate the catastrophic forgetting effectively (regarding (ii)).
1 code implementation • 30 May 2022 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao
To verify our hypothesis, we first empirically study the functionalities of the encoder and decoder in seq2seq pretrained language models, and find that the encoder takes an important but under-exploitation role than the decoder regarding the downstream performance and neuron activation.
1 code implementation • COLING 2022 • Bing Wang, Liang Ding, Qihuang Zhong, Ximing Li, DaCheng Tao
Aspect-based sentiment analysis (ABSA) is a fine-grained sentiment analysis task, which focuses on detecting the sentiment polarity towards the aspect in a sentence.
Aspect-Based Sentiment Analysis (ABSA)
Contrastive Learning
+2
1 code implementation • 13 Jan 2022 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Hua Jin, DaCheng Tao
To this end, we propose a knowledge graph augmented network KGAN, which aims to effectively incorporate external knowledge with explicitly syntactic and contextual information.
1 code implementation • 26 Oct 2021 • Juhua Liu, Qihuang Zhong, Liang Ding, Hua Jin, Bo Du, DaCheng Tao
In practice, we formulate the model pretrained on the sampled instances into a knowledge guidance model and a learner model, respectively.