Search Results for author: Weizhe Yuan

Found 13 papers, 12 papers with code

LLMCRIT: Teaching Large Language Models to Use Criteria

1 code implementation2 Mar 2024 Weizhe Yuan, PengFei Liu, Matthias Gallé

In particular, we present a model-in-the-loop framework that semi-automatically derives criteria from collected guidelines for different writing tasks and constructs in-context demonstrations for each criterion.

Self-Rewarding Language Models

2 code implementations18 Jan 2024 Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho, Xian Li, Sainbayar Sukhbaatar, Jing Xu, Jason Weston

We posit that to achieve superhuman agents, future models require superhuman feedback in order to provide an adequate training signal.

Instruction Following Language Modelling

The Critique of Critique

1 code implementation9 Jan 2024 Shichao Sun, Junlong Li, Weizhe Yuan, Ruifeng Yuan, Wenjie Li, PengFei Liu

In this paper, we pioneer the critique of critique, termed MetaCritique, which is a framework to evaluate the critique from two aspects, i. e., factuality as precision score and comprehensiveness as recall score.

Question Answering

Generative Judge for Evaluating Alignment

1 code implementation9 Oct 2023 Junlong Li, Shichao Sun, Weizhe Yuan, Run-Ze Fan, Hai Zhao, PengFei Liu

The rapid development of Large Language Models (LLMs) has substantially expanded the range of tasks they can address.

FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios

3 code implementations25 Jul 2023 I-Chun Chern, Steffi Chern, Shiqi Chen, Weizhe Yuan, Kehua Feng, Chunting Zhou, Junxian He, Graham Neubig, PengFei Liu

With the above challenges in mind, in this paper, we propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models (e. g., ChatGPT).

Code Generation Fact Checking +1

T5Score: Discriminative Fine-tuning of Generative Evaluation Metrics

1 code implementation12 Dec 2022 Yiwei Qin, Weizhe Yuan, Graham Neubig, PengFei Liu

Both have their advantages; discriminative metrics are able to directly optimize for the problem of distinguishing between good and bad outputs, while generative metrics can be trained using abundant raw text.

reStructured Pre-training

2 code implementations22 Jun 2022 Weizhe Yuan, PengFei Liu

In addition, we test our model in the 2022 College Entrance Examination English that happened a few days ago (2022. 06. 08), and it gets a total score of 134 (v. s.

DataLab: A Platform for Data Analysis and Intervention

no code implementations ACL 2022 Yang Xiao, Jinlan Fu, Weizhe Yuan, Vijay Viswanathan, Zhoumianze Liu, Yixin Liu, Graham Neubig, PengFei Liu

Despite data's crucial role in machine learning, most existing tools and research tend to focus on systems on top of existing data rather than how to interpret and manipulate data.

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

1 code implementation28 Jul 2021 PengFei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, Graham Neubig

This paper surveys and organizes research works in a new paradigm in natural language processing, which we dub "prompt-based learning".

Language Modelling Zero-Shot Learning

BARTScore: Evaluating Generated Text as Text Generation

1 code implementation NeurIPS 2021 Weizhe Yuan, Graham Neubig, PengFei Liu

In this work, we conceptualize the evaluation of generated text as a text generation problem, modeled using pre-trained sequence-to-sequence models.

Informativeness Machine Translation +3

ExplainaBoard: An Explainable Leaderboard for NLP

1 code implementation ACL 2021 PengFei Liu, Jinlan Fu, Yang Xiao, Weizhe Yuan, Shuaicheng Chang, Junqi Dai, Yixin Liu, Zihuiwen Ye, Zi-Yi Dou, Graham Neubig

In this paper, we present a new conceptualization and implementation of NLP evaluation: the ExplainaBoard, which in addition to inheriting the functionality of the standard leaderboard, also allows researchers to (i) diagnose strengths and weaknesses of a single system (e. g.~what is the best-performing system bad at?)

Machine Translation

Can We Automate Scientific Reviewing?

1 code implementation30 Jan 2021 Weizhe Yuan, PengFei Liu, Graham Neubig

The rapid development of science and technology has been accompanied by an exponential growth in peer-reviewed scientific publications.

Review Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.