Search Results for author: Yizhong Wang

Found 32 papers, 24 papers with code

Third-Party Language Model Performance Prediction from Instruction

1 code implementation19 Mar 2024 Rahul Nadkarni, Yizhong Wang, Noah A. Smith

Language model-based instruction-following systems have lately shown increasing performance on many benchmark tasks, demonstrating the capability of adapting to a broad variety of instructions.

Instruction Following Language Modelling

Tur[k]ingBench: A Challenge Benchmark for Web Agents

no code implementations18 Mar 2024 Kevin Xu, Yeganeh Kordi, Kate Sanders, Yizhong Wang, Adam Byerly, Jack Zhang, Benjamin Van Durme, Daniel Khashabi

We evaluate the performance of state-of-the-art models, including language-only, vision-only, and layout-only models, and their combinations, on this benchmark.

Set the Clock: Temporal Alignment of Pretrained Language Models

1 code implementation26 Feb 2024 Bowen Zhao, Zander Brumbaugh, Yizhong Wang, Hannaneh Hajishirzi, Noah A. Smith

We then develop several methods, from prompting to finetuning, to align LMs to use their most recent knowledge when answering questions, and investigate various factors in this alignment.

Can Language Models Act as Knowledge Bases at Scale?

1 code implementation22 Feb 2024 Qiyuan He, Yizhong Wang, Wenya Wang

Large language models (LLMs) have demonstrated remarkable proficiency in understanding and generating responses to complex queries through large-scale pre-training.

Natural Language Queries World Knowledge

Tuning Language Models by Proxy

1 code implementation16 Jan 2024 Alisa Liu, Xiaochuang Han, Yizhong Wang, Yulia Tsvetkov, Yejin Choi, Noah A. Smith

Despite the general capabilities of large pretrained language models, they consistently benefit from further adaptation to better achieve desired behaviors.

Domain Adaptation Math +1

Fine-grained Hallucination Detection and Editing for Language Models

no code implementations12 Jan 2024 Abhika Mishra, Akari Asai, Vidhisha Balachandran, Yizhong Wang, Graham Neubig, Yulia Tsvetkov, Hannaneh Hajishirzi

On our benchmark, our automatic and human evaluations show that FAVA significantly outperforms ChatGPT and GPT-4 on fine-grained hallucination detection, and edits suggested by FAVA improve the factuality of LM-generated text.

Hallucination Retrieval

Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2

2 code implementations17 Nov 2023 Hamish Ivison, Yizhong Wang, Valentina Pyatkin, Nathan Lambert, Matthew Peters, Pradeep Dasigi, Joel Jang, David Wadden, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi

Since the release of T\"ULU [Wang et al., 2023b], open resources for instruction tuning have developed quickly, from better base models to new finetuning techniques.

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

2 code implementations17 Oct 2023 Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, Hannaneh Hajishirzi

Our framework trains a single arbitrary LM that adaptively retrieves passages on-demand, and generates and reflects on retrieved passages and its own generations using special tokens, called reflection tokens.

Fact Verification Response Generation +1

Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging

1 code implementation17 Oct 2023 Joel Jang, Seungone Kim, Bill Yuchen Lin, Yizhong Wang, Jack Hessel, Luke Zettlemoyer, Hannaneh Hajishirzi, Yejin Choi, Prithviraj Ammanabrolu

In this work, we study Reinforcement Learning from Personalized Human Feedback (RLPHF) problem, wherein LLMs are aligned to multiple (sometimes conflicting) preferences by modeling alignment as a Multi-Objective Reinforcement Learning (MORL) problem.

Language Modelling Large Language Model +2

BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models

1 code implementation2 Oct 2023 Qingqing Cao, Sewon Min, Yizhong Wang, Hannaneh Hajishirzi

Retrieval augmentation addresses many critical problems in large language models such as hallucination, staleness, and privacy leaks.

Hallucination Retrieval

How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources

1 code implementation NeurIPS 2023 Yizhong Wang, Hamish Ivison, Pradeep Dasigi, Jack Hessel, Tushar Khot, Khyathi Raghavi Chandu, David Wadden, Kelsey MacMillan, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi

Our evaluations show that the best model in any given evaluation reaches on average 87% of ChatGPT performance, and 73% of GPT-4 performance, suggesting that further investment in building better base models and instruction-tuning data is required to close the gap.

Instruction Following

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

1 code implementation ICCV 2023 Yushi Hu, Benlin Liu, Jungo Kasai, Yizhong Wang, Mari Ostendorf, Ranjay Krishna, Noah A Smith

We introduce TIFA (Text-to-Image Faithfulness evaluation with question Answering), an automatic evaluation metric that measures the faithfulness of a generated image to its text input via visual question answering (VQA).

4k Language Modelling +4

Self-Instruct: Aligning Language Models with Self-Generated Instructions

16 code implementations20 Dec 2022 Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi

Applying our method to the vanilla GPT3, we demonstrate a 33% absolute improvement over the original model on Super-NaturalInstructions, on par with the performance of InstructGPT-001, which was trained with private user data and human annotations.

Instruction Following Language Modelling

HINT: Hypernetwork Instruction Tuning for Efficient Zero- & Few-Shot Generalisation

no code implementations20 Dec 2022 Hamish Ivison, Akshita Bhagia, Yizhong Wang, Hannaneh Hajishirzi, Matthew Peters

By converting instructions into modules, HINT models can effectively disregard the length of instructions and few-shot example inputs in terms of compute usage.

In-Context Learning

One Embedder, Any Task: Instruction-Finetuned Text Embeddings

3 code implementations19 Dec 2022 Hongjin Su, Weijia Shi, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen-tau Yih, Noah A. Smith, Luke Zettlemoyer, Tao Yu

Our analysis suggests that INSTRUCTOR is robust to changes in instructions, and that instruction finetuning mitigates the challenge of training a single model on diverse datasets.

Information Retrieval Learning Word Embeddings +3

Automated Lay Language Summarization of Biomedical Scientific Reviews

1 code implementation23 Dec 2020 Yue Guo, Wei Qiu, Yizhong Wang, Trevor Cohen

Health literacy has emerged as a crucial factor in making appropriate health decisions and ensuring treatment outcomes.

Data Augmentation

LiveQA: A Question Answering Dataset over Sports Live

2 code implementations CCL 2020 Qianying Liu, Sicong Jiang, Yizhong Wang, Sujian Li

In this paper, we introduce LiveQA, a new question answering dataset constructed from play-by-play live broadcast.

Multiple-choice Question Answering

Do NLP Models Know Numbers? Probing Numeracy in Embeddings

1 code implementation IJCNLP 2019 Eric Wallace, Yizhong Wang, Sujian Li, Sameer Singh, Matt Gardner

The ability to understand and work with numbers (numeracy) is critical for many complex reasoning tasks.

Question Answering

Machine Reading Comprehension: a Literature Review

no code implementations30 Jun 2019 Xin Zhang, An Yang, Sujian Li, Yizhong Wang

Machine reading comprehension aims to teach machines to understand a text like a human and is a new challenging direction in Artificial Intelligence.

Machine Reading Comprehension

Toward Fast and Accurate Neural Discourse Segmentation

1 code implementation EMNLP 2018 Yizhong Wang, Sujian Li, Jingfeng Yang

Discourse segmentation, which segments texts into Elementary Discourse Units, is a fundamental step in discourse analysis.

Discourse Segmentation

Bag-of-Words as Target for Neural Machine Translation

1 code implementation ACL 2018 Shuming Ma, Xu sun, Yizhong Wang, Junyang Lin

However, most of the existing neural machine translation models only use one of the correct translations as the targets, and the other correct sentences are punished as the incorrect sentences in the training stage.

Machine Translation Sentence +1

Multi-Passage Machine Reading Comprehension with Cross-Passage Answer Verification

no code implementations ACL 2018 Yizhong Wang, Kai Liu, Jing Liu, wei he, Yajuan Lyu, Hua Wu, Sujian Li, Haifeng Wang

Machine reading comprehension (MRC) on real web data usually requires the machine to answer a question by analyzing multiple passages retrieved by search engine.

Machine Reading Comprehension Question Answering

DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications

3 code implementations WS 2018 Wei He, Kai Liu, Jing Liu, Yajuan Lyu, Shiqi Zhao, Xinyan Xiao, Yu-An Liu, Yizhong Wang, Hua Wu, Qiaoqiao She, Xuan Liu, Tian Wu, Haifeng Wang

Experiments show that human performance is well above current state-of-the-art baseline systems, leaving plenty of room for the community to make improvements.

Machine Reading Comprehension

A Two-Stage Parsing Method for Text-Level Discourse Analysis

1 code implementation ACL 2017 Yizhong Wang, Sujian Li, Houfeng Wang

Previous work introduced transition-based algorithms to form a unified architecture of parsing rhetorical structures (including span, nuclearity and relation), but did not achieve satisfactory performance.

Dependency Parsing Document Summarization +4

Cannot find the paper you are looking for? You can Submit a new open access paper.