1 code implementation • 6 Dec 2024 • Abulhair Saparov, Srushti Pawar, Shreyas Pimpalgaonkar, Nitish Joshi, Richard Yuanzhe Pang, Vishakh Padmakumar, Seyed Mehran Kazemi, Najoung Kim, He He
This difficulty is not resolved even as the number of parameters is increased, suggesting that increasing model scale will not lead to robust search abilities.
no code implementations • 25 Nov 2024 • Yue Yu, Zhengxing Chen, Aston Zhang, Liang Tan, Chenguang Zhu, Richard Yuanzhe Pang, Yundi Qian, Xuewei Wang, Suchin Gururangan, Chao Zhang, Melanie Kambadur, Dhruv Mahajan, Rui Hou
Reward modeling is crucial for aligning large language models (LLMs) with human preferences, especially in reinforcement learning from human feedback (RLHF).
no code implementations • 6 Nov 2024 • Archiki Prasad, Weizhe Yuan, Richard Yuanzhe Pang, Jing Xu, Maryam Fazel-Zarandi, Mohit Bansal, Sainbayar Sukhbaatar, Jason Weston, Jane Yu
Self-alignment, whereby models learn to improve themselves without human annotation, is a rapidly growing research area.
no code implementations • 5 Aug 2024 • Tianlu Wang, Ilia Kulikov, Olga Golovneva, Ping Yu, Weizhe Yuan, Jane Dwivedi-Yu, Richard Yuanzhe Pang, Maryam Fazel-Zarandi, Jason Weston, Xian Li
Model-based evaluation is at the heart of successful model development -- as a reward model for training, and as a replacement for human evaluation.
no code implementations • 27 May 2024 • Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C. Li, Adrien Bardes, Suzanne Petryk, Oscar Mañas, Zhiqiu Lin, Anas Mahmoud, Bargav Jayaraman, Mark Ibrahim, Melissa Hall, Yunyang Xiong, Jonathan Lebensold, Candace Ross, Srihari Jayakumar, Chuan Guo, Diane Bouchacourt, Haider Al-Tahan, Karthik Padthe, Vasu Sharma, Hu Xu, Xiaoqing Ellen Tan, Megan Richards, Samuel Lavoie, Pietro Astolfi, Reyhane Askari Hemmat, Jun Chen, Kushal Tirumala, Rim Assouel, Mazda Moayeri, Arjang Talattof, Kamalika Chaudhuri, Zechun Liu, Xilun Chen, Quentin Garrido, Karen Ullrich, Aishwarya Agrawal, Kate Saenko, Asli Celikyilmaz, Vikas Chandra
Then, we present and discuss approaches to evaluate VLMs.
no code implementations • 30 Apr 2024 • Richard Yuanzhe Pang, Weizhe Yuan, Kyunghyun Cho, He He, Sainbayar Sukhbaatar, Jason Weston
Iterative preference optimization methods have recently been shown to perform well for general instruction tuning tasks, but typically make little improvement on reasoning tasks (Yuan et al., 2024, Chen et al., 2024).
3 code implementations • 18 Jan 2024 • Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho, Xian Li, Sainbayar Sukhbaatar, Jing Xu, Jason Weston
We posit that to achieve superhuman agents, future models require superhuman feedback in order to provide an adequate training signal.
2 code implementations • 20 Nov 2023 • David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, Samuel R. Bowman
We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry.
no code implementations • 26 Jul 2023 • Richard Yuanzhe Pang, Stephen Roller, Kyunghyun Cho, He He, Jason Weston
We study improving social conversational agents by learning from natural dialogue between users and a deployed model, without extra annotations.
1 code implementation • NeurIPS 2023 • Abulhair Saparov, Richard Yuanzhe Pang, Vishakh Padmakumar, Nitish Joshi, Seyed Mehran Kazemi, Najoung Kim, He He
Given the intractably large size of the space of proofs, any model that is capable of general deductive reasoning must generalize to proofs of greater complexity.
1 code implementation • 8 Mar 2023 • Vishakh Padmakumar, Richard Yuanzhe Pang, He He, Ankur P. Parikh
We study the problem of extrapolative controlled generation, i. e., generating sequences with attribute values beyond the range seen in training.
no code implementations • 16 Nov 2022 • Richard Yuanzhe Pang, Vishakh Padmakumar, Thibault Sellam, Ankur P. Parikh, He He
To align conditional text generation model outputs with desired behaviors, there has been an increasing focus on training the model using reinforcement learning (RL) with reward functions learned from human annotations.
no code implementations • 26 Aug 2022 • Julian Michael, Ari Holtzman, Alicia Parrish, Aaron Mueller, Alex Wang, Angelica Chen, Divyam Madaan, Nikita Nangia, Richard Yuanzhe Pang, Jason Phang, Samuel R. Bowman
We present the results of the NLP Community Metasurvey.
1 code implementation • 23 May 2022 • Alex Wang, Richard Yuanzhe Pang, Angelica Chen, Jason Phang, Samuel R. Bowman
Summarization datasets are often assembled either by scraping naturally occurring public-domain summaries -- which are nearly always in difficult-to-work-with technical domains -- or by using approximate heuristics to extract them from everyday text -- which frequently yields unfaithful summaries.
no code implementations • ACL 2022 • Le Hou, Richard Yuanzhe Pang, Tianyi Zhou, Yuexin Wu, Xinying Song, Xiaodan Song, Denny Zhou
Transformer-based models generally allocate the same amount of computation for each token in a given sequence.
3 code implementations • NAACL 2022 • Richard Yuanzhe Pang, Alicia Parrish, Nitish Joshi, Nikita Nangia, Jason Phang, Angelica Chen, Vishakh Padmakumar, Johnny Ma, Jana Thompson, He He, Samuel R. Bowman
To enable building and testing models on long-document comprehension, we introduce QuALITY, a multiple-choice QA dataset with context passages in English that have an average length of about 5, 000 tokens, much longer than typical current models can process.
no code implementations • 16 Dec 2021 • Richard Yuanzhe Pang, He He, Kyunghyun Cho
For all three approaches, the generated translations fail to achieve rewards comparable to BSR, but the translation quality approximated by BLEU and BLEURT is similar to the quality of BSR-produced translations.
no code implementations • Findings (ACL) 2021 • Richard Yuanzhe Pang, Adam D. Lelkes, Vinh Q. Tran, Cong Yu
Given the lack of existing datasets, we create a dataset for AgreeSum, and provide annotations on article-summary entailment relations for a subset of the clusters in the dataset.
no code implementations • ACL 2021 • Clara Vania, Phu Mon Htut, William Huang, Dhara Mungra, Richard Yuanzhe Pang, Jason Phang, Haokun Liu, Kyunghyun Cho, Samuel R. Bowman
Recent years have seen numerous NLP datasets introduced to evaluate the performance of fine-tuned models on natural language understanding tasks.
1 code implementation • ICLR 2021 • Richard Yuanzhe Pang, He He
Current approaches to text generation largely rely on autoregressive models and maximum likelihood estimation.
no code implementations • ACL 2020 • Yada Pruksachatkun, Jason Phang, Haokun Liu, Phu Mon Htut, Xiaoyi Zhang, Richard Yuanzhe Pang, Clara Vania, Katharina Kann, Samuel R. Bowman
However, we fail to observe more granular correlations between probing and target task performance, highlighting the need for further work on broad-coverage probing benchmarks.
1 code implementation • ACL 2020 • Lifu Tu, Richard Yuanzhe Pang, Sam Wiseman, Kevin Gimpel
We propose to train a non-autoregressive machine translation model to minimize the energy defined by a pretrained autoregressive model.
no code implementations • 1 May 2020 • Yada Pruksachatkun, Jason Phang, Haokun Liu, Phu Mon Htut, Xiaoyi Zhang, Richard Yuanzhe Pang, Clara Vania, Katharina Kann, Samuel R. Bowman
However, we fail to observe more granular correlations between probing and target task performance, highlighting the need for further work on broad-coverage probing benchmarks.
1 code implementation • EMNLP 2020 • Sean Welleck, Ilia Kulikov, Jaedeok Kim, Richard Yuanzhe Pang, Kyunghyun Cho
Despite strong performance on a variety of tasks, neural sequence models trained with maximum likelihood have been shown to exhibit issues such as length bias and degenerate repetition.
no code implementations • EMNLP (spnlp) 2020 • Lifu Tu, Richard Yuanzhe Pang, Kevin Gimpel
Deep energy-based models are powerful, but pose challenges for learning and inference (Belanger and McCallum, 2016).
no code implementations • WS 2019 • Richard Yuanzhe Pang
Regarding the problem of automatically generating paraphrases with modified styles or attributes, the difficulty lies in the lack of parallel corpora.
no code implementations • 9 Oct 2019 • Richard Yuanzhe Pang
The difficulty of textual style transfer lies in the lack of parallel corpora.
no code implementations • WS 2019 • Richard Yuanzhe Pang, Kevin Gimpel
We show that the metric of post-transfer classification accuracy is insufficient on its own, and propose additional metrics based on semantic preservation and fluency as well as a way to combine them into a single overall score.