Search Results for author: Zhiyuan Zeng

Found 18 papers, 10 papers with code

Gradient-Based Adversarial Factual Consistency Evaluation for Abstractive Summarization

no code implementations EMNLP 2021 Zhiyuan Zeng, Jiaze Chen, Weiran Xu, Lei LI

Based on the artificial dataset, we train an evaluation model that can not only make accurate and robust factual consistency discrimination but is also capable of making interpretable factual errors tracing by backpropagated gradient distribution on token embeddings.

Abstractive Text Summarization Data Augmentation

Turn Waste into Worth: Rectifying Top-$k$ Router of MoE

no code implementations17 Feb 2024 Zhiyuan Zeng, Qipeng Guo, Zhaoye Fei, Zhangyue Yin, Yunhua Zhou, Linyang Li, Tianxiang Sun, Hang Yan, Dahua Lin, Xipeng Qiu

To address the dropped tokens and padding, we propose the Rectify-Router, comprising the Intra-GPU Rectification and the Fill-in Rectification.

Computational Efficiency

Query of CC: Unearthing Large Scale Domain-Specific Knowledge from Public Corpora

no code implementations26 Jan 2024 Zhaoye Fei, Yunfan Shao, Linyang Li, Zhiyuan Zeng, Conghui He, Hang Yan, Dahua Lin, Xipeng Qiu

Large language models have demonstrated remarkable potential in various tasks, however, there remains a significant scarcity of open-source models and data for specific domains.

Language Modelling Large Language Model

Evaluating Large Language Models at Evaluating Instruction Following

1 code implementation11 Oct 2023 Zhiyuan Zeng, Jiatong Yu, Tianyu Gao, Yu Meng, Tanya Goyal, Danqi Chen

As research in large language models (LLMs) continues to accelerate, LLM-based evaluation has emerged as a scalable and cost-effective alternative to human evaluations for comparing the ever increasing list of models.

Instruction Following

Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning

1 code implementation10 Oct 2023 Mengzhou Xia, Tianyu Gao, Zhiyuan Zeng, Danqi Chen

In this work, we study structured pruning as an effective means to develop smaller LLMs from pre-trained, larger models.

Language Modelling Question Answering +1

Plug-and-Play Knowledge Injection for Pre-trained Language Models

1 code implementation28 May 2023 Zhengyan Zhang, Zhiyuan Zeng, Yankai Lin, Huadong Wang, Deming Ye, Chaojun Xiao, Xu Han, Zhiyuan Liu, Peng Li, Maosong Sun, Jie zhou

Experimental results on three knowledge-driven NLP tasks show that existing injection methods are not suitable for the new paradigm, while map-tuning effectively improves the performance of downstream models.

Emergent Modularity in Pre-trained Transformers

1 code implementation28 May 2023 Zhengyan Zhang, Zhiyuan Zeng, Yankai Lin, Chaojun Xiao, Xiaozhi Wang, Xu Han, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Jie zhou

In analogy to human brains, we consider two main characteristics of modularity: (1) functional specialization of neurons: we evaluate whether each neuron is mainly specialized in a certain function, and find that the answer is yes.

KNIFE: Distilling Reasoning Knowledge From Free-Text Rationales

no code implementations19 Dec 2022 Aaron Chan, Zhiyuan Zeng, Wyatt Lake, Brihi Joshi, Hanjie Chen, Xiang Ren

First, KNIFE finetunes a teacher LM (given task input and FTR) to predict the task output, transferring reasoning knowledge from the FTRs to the teacher's hidden states.

Knowledge Distillation Language Modelling +1

Disentangling Confidence Score Distribution for Out-of-Domain Intent Detection with Energy-Based Learning

no code implementations17 Oct 2022 Yanan Wu, Zhiyuan Zeng, Keqing He, Yutao Mou, Pei Wang, Yuanmeng Yan, Weiran Xu

In this paper, we propose a simple but strong energy-based score function to detect OOD where the energy scores of OOD samples are higher than IND samples.

Intent Detection Out of Distribution (OOD) Detection

Distribution Calibration for Out-of-Domain Detection with Bayesian Approximation

1 code implementation COLING 2022 Yanan Wu, Zhiyuan Zeng, Keqing He, Yutao Mou, Pei Wang, Weiran Xu

Out-of-Domain (OOD) detection is a key component in a task-oriented dialog system, which aims to identify whether a query falls outside the predefined supported intent set.

Out of Distribution (OOD) Detection

Unsupervised and Few-shot Parsing from Pretrained Language Models

no code implementations10 Jun 2022 Zhiyuan Zeng, Deyi Xiong

We therefore extend the unsupervised models to few-shot parsing models (FPOA, FPIO) that use a few annotated trees to learn better linear projection matrices for parsing.

Language Modelling

An Empirical Study on Adversarial Attack on NMT: Languages and Positions Matter

no code implementations ACL 2021 Zhiyuan Zeng, Deyi Xiong

For autoregressive NMT models that generate target words from left to right, we observe that adversarial attack on the source language is more effective than on the target language, and that attacking front positions of target sentences or positions of source sentences aligned to the front positions of corresponding target sentences is more effective than attacking other positions.

Adversarial Attack NMT

Cannot find the paper you are looking for? You can Submit a new open access paper.