First a RoBERTa model is first applied to give a local ranking of the candidate sentences.
From this observation, we use mutual learning to improve BERT’s early exiting performances, that is, we ask each exit of a multi-exit BERT to distill knowledge from each other.
However, the scarcity of diverse, naturalistic datasets of human preferences on LLM outputs at scale poses a great challenge to RLHF as well as feedback learning research within the open-source community.
To train UDR, we cast various tasks' training signals into a unified list-wise ranking formulation by language model's feedback.
Previous works usually adopt heuristic metrics such as the entropy of internal outputs to measure instance difficulty, which suffers from generalization and threshold-tuning.
1 code implementation • • Ningyu Zhang, Mosha Chen, Zhen Bi, Xiaozhuan Liang, Lei LI, Xin Shang, Kangping Yin, Chuanqi Tan, Jian Xu, Fei Huang, Luo Si, Yuan Ni, Guotong Xie, Zhifang Sui, Baobao Chang, Hui Zong, Zheng Yuan, Linfeng Li, Jun Yan, Hongying Zan, Kunli Zhang, Buzhou Tang, Qingcai Chen
Artificial Intelligence (AI), along with the recent progress in biomedical language understanding, is gradually changing medical practice.
Ranked #1 on Medical Relation Extraction on CMeIE
In developing an online question-answering system for the medical domains, natural language inference (NLI) models play a central role in question matching and intention detection.
Ablation study demonstrates the necessity of our search space design and the effectiveness of our search method.
Though the transformer architectures have shown dominance in many natural language understanding tasks, there are still unsolved issues for the training of transformer models, especially the need for a principled way of warm-up which has shown importance for stable training of a transformer, as well as whether the task at hand prefer to scale the attention product or not.
We have experimented both (a) improving the fine-tuning of pre-trained language models on a task with a small dataset size, by leveraging datasets of similar tasks; and (b) incorporating the distributional representations of a KG onto the representations of pre-trained language models, via simply concatenation or multi-head attention.
Transfer learning from the NLI task to the RQE task is also experimented, which proves to be useful in improving the results of fine-tuning MT-DNN large.