Search Results for author: Richard Yuanzhe Pang

Found 28 papers, 10 papers with code

Transformers Struggle to Learn to Search

1 code implementation6 Dec 2024 Abulhair Saparov, Srushti Pawar, Shreyas Pimpalgaonkar, Nitish Joshi, Richard Yuanzhe Pang, Vishakh Padmakumar, Seyed Mehran Kazemi, Najoung Kim, He He

This difficulty is not resolved even as the number of parameters is increased, suggesting that increasing model scale will not lead to robust search abilities.

Self-Generated Critiques Boost Reward Modeling for Language Models

no code implementations25 Nov 2024 Yue Yu, Zhengxing Chen, Aston Zhang, Liang Tan, Chenguang Zhu, Richard Yuanzhe Pang, Yundi Qian, Xuewei Wang, Suchin Gururangan, Chao Zhang, Melanie Kambadur, Dhruv Mahajan, Rui Hou

Reward modeling is crucial for aligning large language models (LLMs) with human preferences, especially in reinforcement learning from human feedback (RLHF).

Self-Consistency Preference Optimization

no code implementations6 Nov 2024 Archiki Prasad, Weizhe Yuan, Richard Yuanzhe Pang, Jing Xu, Maryam Fazel-Zarandi, Mohit Bansal, Sainbayar Sukhbaatar, Jason Weston, Jane Yu

Self-alignment, whereby models learn to improve themselves without human annotation, is a rapidly growing research area.

GSM8K Math

Self-Taught Evaluators

no code implementations5 Aug 2024 Tianlu Wang, Ilia Kulikov, Olga Golovneva, Ping Yu, Weizhe Yuan, Jane Dwivedi-Yu, Richard Yuanzhe Pang, Maryam Fazel-Zarandi, Jason Weston, Xian Li

Model-based evaluation is at the heart of successful model development -- as a reward model for training, and as a replacement for human evaluation.

Iterative Reasoning Preference Optimization

no code implementations30 Apr 2024 Richard Yuanzhe Pang, Weizhe Yuan, Kyunghyun Cho, He He, Sainbayar Sukhbaatar, Jason Weston

Iterative preference optimization methods have recently been shown to perform well for general instruction tuning tasks, but typically make little improvement on reasoning tasks (Yuan et al., 2024, Chen et al., 2024).

ARC GSM8K +1

Self-Rewarding Language Models

3 code implementations18 Jan 2024 Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho, Xian Li, Sainbayar Sukhbaatar, Jing Xu, Jason Weston

We posit that to achieve superhuman agents, future models require superhuman feedback in order to provide an adequate training signal.

Instruction Following Language Modeling +1

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

2 code implementations20 Nov 2023 David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, Samuel R. Bowman

We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry.

Multiple-choice

Leveraging Implicit Feedback from Deployment Data in Dialogue

no code implementations26 Jul 2023 Richard Yuanzhe Pang, Stephen Roller, Kyunghyun Cho, He He, Jason Weston

We study improving social conversational agents by learning from natural dialogue between users and a deployed model, without extra annotations.

Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples

1 code implementation NeurIPS 2023 Abulhair Saparov, Richard Yuanzhe Pang, Vishakh Padmakumar, Nitish Joshi, Seyed Mehran Kazemi, Najoung Kim, He He

Given the intractably large size of the space of proofs, any model that is capable of general deductive reasoning must generalize to proofs of greater complexity.

Extrapolative Controlled Sequence Generation via Iterative Refinement

1 code implementation8 Mar 2023 Vishakh Padmakumar, Richard Yuanzhe Pang, He He, Ankur P. Parikh

We study the problem of extrapolative controlled generation, i. e., generating sequences with attribute values beyond the range seen in training.

Attribute Drug Discovery +1

Reward Gaming in Conditional Text Generation

no code implementations16 Nov 2022 Richard Yuanzhe Pang, Vishakh Padmakumar, Thibault Sellam, Ankur P. Parikh, He He

To align conditional text generation model outputs with desired behaviors, there has been an increasing focus on training the model using reinforcement learning (RL) with reward functions learned from human annotations.

Conditional Text Generation Reinforcement Learning (RL)

SQuALITY: Building a Long-Document Summarization Dataset the Hard Way

1 code implementation23 May 2022 Alex Wang, Richard Yuanzhe Pang, Angelica Chen, Jason Phang, Samuel R. Bowman

Summarization datasets are often assembled either by scraping naturally occurring public-domain summaries -- which are nearly always in difficult-to-work-with technical domains -- or by using approximate heuristics to extract them from everyday text -- which frequently yields unfaithful summaries.

Document Summarization Multiple-choice

QuALITY: Question Answering with Long Input Texts, Yes!

3 code implementations NAACL 2022 Richard Yuanzhe Pang, Alicia Parrish, Nitish Joshi, Nikita Nangia, Jason Phang, Angelica Chen, Vishakh Padmakumar, Johnny Ma, Jana Thompson, He He, Samuel R. Bowman

To enable building and testing models on long-document comprehension, we introduce QuALITY, a multiple-choice QA dataset with context passages in English that have an average length of about 5, 000 tokens, much longer than typical current models can process.

Multiple-choice Multiple Choice Question Answering (MCQA)

Amortized Noisy Channel Neural Machine Translation

no code implementations16 Dec 2021 Richard Yuanzhe Pang, He He, Kyunghyun Cho

For all three approaches, the generated translations fail to achieve rewards comparable to BSR, but the translation quality approximated by BLEU and BLEURT is similar to the quality of BSR-produced translations.

Imitation Learning Knowledge Distillation +4

AgreeSum: Agreement-Oriented Multi-Document Summarization

no code implementations Findings (ACL) 2021 Richard Yuanzhe Pang, Adam D. Lelkes, Vinh Q. Tran, Cong Yu

Given the lack of existing datasets, we create a dataset for AgreeSum, and provide annotations on article-summary entailment relations for a subset of the clusters in the dataset.

Abstractive Text Summarization Document Summarization +1

Text Generation by Learning from Demonstrations

1 code implementation ICLR 2021 Richard Yuanzhe Pang, He He

Current approaches to text generation largely rely on autoregressive models and maximum likelihood estimation.

Machine Translation Question Generation +4

ENGINE: Energy-Based Inference Networks for Non-Autoregressive Machine Translation

1 code implementation ACL 2020 Lifu Tu, Richard Yuanzhe Pang, Sam Wiseman, Kevin Gimpel

We propose to train a non-autoregressive machine translation model to minimize the energy defined by a pretrained autoregressive model.

de-en Machine Translation +1

Consistency of a Recurrent Language Model With Respect to Incomplete Decoding

1 code implementation EMNLP 2020 Sean Welleck, Ilia Kulikov, Jaedeok Kim, Richard Yuanzhe Pang, Kyunghyun Cho

Despite strong performance on a variety of tasks, neural sequence models trained with maximum likelihood have been shown to exhibit issues such as length bias and degenerate repetition.

Language Modeling Language Modelling

Towards Actual (Not Operational) Textual Style Transfer Auto-Evaluation

no code implementations WS 2019 Richard Yuanzhe Pang

Regarding the problem of automatically generating paraphrases with modified styles or attributes, the difficulty lies in the lack of parallel corpora.

Semantic Similarity Semantic Textual Similarity +1

The Daunting Task of Real-World Textual Style Transfer Auto-Evaluation

no code implementations9 Oct 2019 Richard Yuanzhe Pang

The difficulty of textual style transfer lies in the lack of parallel corpora.

Style Transfer

Unsupervised Evaluation Metrics and Learning Criteria for Non-Parallel Textual Transfer

no code implementations WS 2019 Richard Yuanzhe Pang, Kevin Gimpel

We show that the metric of post-transfer classification accuracy is insufficient on its own, and propose additional metrics based on semantic preservation and fluency as well as a way to combine them into a single overall score.

Sentence

Cannot find the paper you are looking for? You can Submit a new open access paper.