Search Results for author: Zhiyuan Zeng

Found 18 papers, 10 papers with code

Gradient-Based Adversarial Factual Consistency Evaluation for Abstractive Summarization

no code implementations • EMNLP 2021 • Zhiyuan Zeng, Jiaze Chen, Weiran Xu, Lei LI

Based on the artificial dataset, we train an evaluation model that can not only make accurate and robust factual consistency discrimination but is also capable of making interpretable factual errors tracing by backpropagated gradient distribution on token embeddings.

Abstractive Text Summarization Data Augmentation

Paper
Add Code

Revisit Overconfidence for OOD Detection: Reassigned Contrastive Learning with Adaptive Class-dependent Threshold

1 code implementation • NAACL 2022 • Yanan Wu, Keqing He, Yuanmeng Yan, QiXiang Gao, Zhiyuan Zeng, Fujia Zheng, Lulu Zhao, Huixing Jiang, Wei Wu, Weiran Xu

Detecting Out-of-Domain (OOD) or unknown intents from user queries is essential in a task-oriented dialog system.

Contrastive Learning Out of Distribution (OOD) Detection

Paper
Code

Disentangled Knowledge Transfer for OOD Intent Discovery with Unified Contrastive Learning

1 code implementation • ACL 2022 • Yutao Mou, Keqing He, Yanan Wu, Zhiyuan Zeng, Hong Xu, Huixing Jiang, Wei Wu, Weiran Xu

Discovering Out-of-Domain(OOD) intents is essential for developing new skills in a task-oriented dialogue system.

Clustering Contrastive Learning +2

Paper
Code

WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset

no code implementations • 29 Feb 2024 • Jiantao Qiu, Haijun Lv, Zhenjiang Jin, Rui Wang, Wenchang Ning, JIA YU, Chaobin Zhang, Zhenxiang Li, Pei Chu, Yuan Qu, Jin Shi, Lindong Lu, Runyu Peng, Zhiyuan Zeng, Huanze Tang, Zhikai Lei, Jiawei Hong, Keyu Chen, Zhaoye Fei, Ruiliang Xu, Wei Li, Zhongying Tu, Lin Dahua, Yu Qiao, Hang Yan, Conghui He

To evaluate the quality and utility of the dataset, we trained 1B-parameter and 3B-parameter models using WanJuan-CC and another dataset, RefinedWeb.

Paper
Add Code

Turn Waste into Worth: Rectifying Top-$k$ Router of MoE

no code implementations • 17 Feb 2024 • Zhiyuan Zeng, Qipeng Guo, Zhaoye Fei, Zhangyue Yin, Yunhua Zhou, Linyang Li, Tianxiang Sun, Hang Yan, Dahua Lin, Xipeng Qiu

To address the dropped tokens and padding, we propose the Rectify-Router, comprising the Intra-GPU Rectification and the Fill-in Rectification.

Computational Efficiency

Paper
Add Code

Query of CC: Unearthing Large Scale Domain-Specific Knowledge from Public Corpora

no code implementations • 26 Jan 2024 • Zhaoye Fei, Yunfan Shao, Linyang Li, Zhiyuan Zeng, Conghui He, Hang Yan, Dahua Lin, Xipeng Qiu

Large language models have demonstrated remarkable potential in various tasks, however, there remains a significant scarcity of open-source models and data for specific domains.

Language Modelling Large Language Model

Paper
Add Code

Evaluating Large Language Models at Evaluating Instruction Following

1 code implementation • 11 Oct 2023 • Zhiyuan Zeng, Jiatong Yu, Tianyu Gao, Yu Meng, Tanya Goyal, Danqi Chen

As research in large language models (LLMs) continues to accelerate, LLM-based evaluation has emerged as a scalable and cost-effective alternative to human evaluations for comparing the ever increasing list of models.

Instruction Following

Paper
Code

Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning

1 code implementation • 10 Oct 2023 • Mengzhou Xia, Tianyu Gao, Zhiyuan Zeng, Danqi Chen

In this work, we study structured pruning as an effective means to develop smaller LLMs from pre-trained, larger models.

Ranked #40 on Question Answering on PIQA

Language Modelling Question Answering +1

442

Paper
Code

Plug-and-Play Knowledge Injection for Pre-trained Language Models

1 code implementation • 28 May 2023 • Zhengyan Zhang, Zhiyuan Zeng, Yankai Lin, Huadong Wang, Deming Ye, Chaojun Xiao, Xu Han, Zhiyuan Liu, Peng Li, Maosong Sun, Jie zhou

Experimental results on three knowledge-driven NLP tasks show that existing injection methods are not suitable for the new paradigm, while map-tuning effectively improves the performance of downstream models.

Paper
Code

Emergent Modularity in Pre-trained Transformers

1 code implementation • 28 May 2023 • Zhengyan Zhang, Zhiyuan Zeng, Yankai Lin, Chaojun Xiao, Xiaozhi Wang, Xu Han, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Jie zhou

In analogy to human brains, we consider two main characteristics of modularity: (1) functional specialization of neurons: we evaluate whether each neuron is mainly specialized in a certain function, and find that the answer is yes.

Paper
Code

KNIFE: Distilling Reasoning Knowledge From Free-Text Rationales

no code implementations • 19 Dec 2022 • Aaron Chan, Zhiyuan Zeng, Wyatt Lake, Brihi Joshi, Hanjie Chen, Xiang Ren

First, KNIFE finetunes a teacher LM (given task input and FTR) to predict the task output, transferring reasoning knowledge from the FTRs to the teacher's hidden states.

Knowledge Distillation Language Modelling +1

Paper
Add Code

Disentangling Confidence Score Distribution for Out-of-Domain Intent Detection with Energy-Based Learning

no code implementations • 17 Oct 2022 • Yanan Wu, Zhiyuan Zeng, Keqing He, Yutao Mou, Pei Wang, Yuanmeng Yan, Weiran Xu

In this paper, we propose a simple but strong energy-based score function to detect OOD where the energy scores of OOD samples are higher than IND samples.

Intent Detection Out of Distribution (OOD) Detection

Paper
Add Code

Distribution Calibration for Out-of-Domain Detection with Bayesian Approximation

1 code implementation • COLING 2022 • Yanan Wu, Zhiyuan Zeng, Keqing He, Yutao Mou, Pei Wang, Weiran Xu

Out-of-Domain (OOD) detection is a key component in a task-oriented dialog system, which aims to identify whether a query falls outside the predefined supported intent set.

Out of Distribution (OOD) Detection

Paper
Code

Unsupervised and Few-shot Parsing from Pretrained Language Models

no code implementations • 10 Jun 2022 • Zhiyuan Zeng, Deyi Xiong

We therefore extend the unsupervised models to few-shot parsing models (FPOA, FPIO) that use a few annotated trees to learn better linear projection matrices for parsing.

Language Modelling

Paper
Add Code

An Empirical Study on Adversarial Attack on NMT: Languages and Positions Matter

no code implementations • ACL 2021 • Zhiyuan Zeng, Deyi Xiong

For autoregressive NMT models that generate target words from left to right, we observe that adversarial attack on the source language is more effective than on the target language, and that attacking front positions of target sentences or positions of source sentences aligned to the front positions of corresponding target sentences is more effective than attacking other positions.

Adversarial Attack NMT

Paper
Add Code

Adversarial Self-Supervised Learning for Out-of-Domain Detection

1 code implementation • NAACL 2021 • Zhiyuan Zeng, Keqing He, Yuanmeng Yan, Hong Xu, Weiran Xu

Detecting out-of-domain (OOD) intents is crucial for the deployed task-oriented dialogue system.

Contrastive Learning Out of Distribution (OOD) Detection +1

Paper
Code

Novel Slot Detection: A Benchmark for Discovering Unknown Slot Types in the Task-Oriented Dialogue System

1 code implementation • ACL 2021 • Yanan Wu, Zhiyuan Zeng, Keqing He, Hong Xu, Yuanmeng Yan, Huixing Jiang, Weiran Xu

Existing slot filling models can only recognize pre-defined in-domain slot types from a limited slot set.

slot-filling Slot Filling

Paper
Code

Modeling Discriminative Representations for Out-of-Domain Detection with Supervised Contrastive Learning

1 code implementation • ACL 2021 • Zhiyuan Zeng, Keqing He, Yuanmeng Yan, Zijun Liu, Yanan Wu, Hong Xu, Huixing Jiang, Weiran Xu

Detecting Out-of-Domain (OOD) or unknown intents from user queries is essential in a task-oriented dialog system.

Contrastive Learning Out of Distribution (OOD) Detection

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.