1 code implementation • 18 Sep 2024 • Zayne Sprague, Fangcong Yin, Juan Diego Rodriguez, Dongwei Jiang, Manya Wadhwa, Prasann Singhal, Xinyu Zhao, Xi Ye, Kyle Mahowald, Greg Durrett
Chain-of-thought (CoT) via prompting is the de facto method for eliciting reasoning capabilities from large language models (LLMs).
no code implementations • 8 Jul 2024 • Zeyu Leo Liu, Shrey Pandit, Xi Ye, Eunsol Choi, Greg Durrett
An instance in our benchmark consists of a synthetic API function update paired with a program synthesis example that uses the updated functionality; our goal is to update an LLM to be able to solve this program synthesis example without providing documentation of the update at inference time.
1 code implementation • 3 Jun 2024 • Fangcong Yin, Xi Ye, Greg Durrett
For truthfulness and reasoning tasks, we find that LoFiT's intervention vectors are more effective for LLM adaptation than vectors from representation intervention methods such as Inference-time Intervention.
no code implementations • 30 May 2024 • Xinlu Zhang, Zhiyu Zoey Chen, Xi Ye, Xianjun Yang, Lichang Chen, William Yang Wang, Linda Ruth Petzold
First, coding data tuning enhances the overall reasoning capabilities of LLMs across different model families and scales.
1 code implementation • 6 May 2024 • Zhizhao Duan, Hao Cheng, Duo Xu, Xi Wu, Xiangxie Zhang, Xi Ye, Zhen Xie
In the vast and dynamic landscape of urban settings, Traffic Safety Description and Analysis plays a pivotal role in applications ranging from insurance inspection to accident prevention.
no code implementations • 18 Apr 2024 • Yoonsang Lee, Xi Ye, Eunsol Choi
and a set of documents discussing different people named Michael Jordan, can LMs distinguish entity mentions to generate a cohesive answer to the question?
1 code implementation • 11 Dec 2023 • Xi Ye, Guillaume-Alexandre Bilodeau
Predicting future frames of a video is challenging because it is difficult to learn the uncertainty of the underlying factors influencing their contents.
1 code implementation • 16 Nov 2023 • Yoonsang Lee, Pranav Atreya, Xi Ye, Eunsol Choi
We perform analysis on three multi-answer question answering datasets, which allows us to further study answer set ordering strategies based on the LM's knowledge of each answer.
no code implementations • 16 Nov 2023 • Xi Ye, Ruoxi Sun, Sercan Ö. Arik, Tomas Pfister
Our framework tunes LLMs to selfground the claims in their responses and provide accurate citations to retrieved documents.
3 code implementations • 24 Oct 2023 • Zayne Sprague, Xi Ye, Kaj Bostrom, Swarat Chaudhuri, Greg Durrett
We evaluate a range of LLMs and prompting techniques on this dataset and characterize the gaps that remain for techniques like chain-of-thought to perform robust reasoning.
1 code implementation • 1 Jun 2023 • Prasann Singhal, Jiacheng Xu, Xi Ye, Greg Durrett
Standard decoding approaches for conditional text generation tasks typically search for an output hypothesis with high model probability, but this may not yield the best hypothesis according to human judgments of quality.
1 code implementation • NeurIPS 2023 • Xi Ye, Qiaochu Chen, Isil Dillig, Greg Durrett
In this paper, we propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of LLMs.
1 code implementation • 9 Feb 2023 • Xi Ye, Greg Durrett
We first generate sets of candidate explanations for each example in the prompt using a leave-one-out scheme, then find an effective combination of these explanations with a two-stage framework.
1 code implementation • 12 Dec 2022 • Xi Ye, Guillaume-Alexandre Bilodeau
Video prediction is a challenging computer vision task that has a wide range of applications.
1 code implementation • 25 Nov 2022 • Xi Ye, Srinivasan Iyer, Asli Celikyilmaz, Ves Stoyanov, Greg Durrett, Ramakanth Pasunuru
Large language models (LLMs) have exhibited remarkable capabilities in learning from explanations in prompts, but there has been limited understanding of exactly how these explanations function or why they are effective.
no code implementations • 13 Oct 2022 • Prasann Singhal, Jarad Forristal, Xi Ye, Greg Durrett
We address the task of predicting out-of-domain (OOD) performance in a few-shot fashion: given a few target-domain examples and a set of models with similar training performance, can we understand how these models will perform on OOD test data?
1 code implementation • 11 Oct 2022 • Xi Ye, Guillaume-Alexandre Bilodeau
Different conditional video prediction tasks, like video future frame prediction and video frame interpolation, are normally solved by task-related models even though they share many common underlying characteristics.
no code implementations • 9 Jun 2022 • Weikai Yang, Xi Ye, Xingxing Zhang, Lanxi Xiao, Jiazhi Xia, Zhongyuan Wang, Jun Zhu, Hanspeter Pfister, Shixia Liu
The base learners and labeled samples (shots) in an ensemble few-shot classifier greatly affect the model performance.
1 code implementation • 6 May 2022 • Xi Ye, Greg Durrett
Does prompting a large language model (LLM) like GPT-3 with explanations improve in-context learning?
1 code implementation • 29 Mar 2022 • Xi Ye, Guillaume-Alexandre Bilodeau
Based on this new Transformer block, a fully autoregressive video future frames prediction Transformer is proposed.
2 code implementations • ACL 2022 • Xi Ye, Greg Durrett
Our approach first extracts a set of features combining human intuition about the task with model attributions generated by black box interpretation techniques, then uses a simple calibrator, in the form of a classifier, to predict whether the base model was correct or not.
1 code implementation • ACL 2022 • Xi Ye, Semih Yavuz, Kazuma Hashimoto, Yingbo Zhou, Caiming Xiong
We present RnG-KBQA, a Rank-and-Generate approach for KBQA, which remedies the coverage issue with a generation model while preserving a strong generalization capability.
1 code implementation • EMNLP 2021 • Xi Ye, Rohan Nair, Greg Durrett
When a model attribution technique highlights a particular part of the input, a user might understand this highlight as making a statement about counterfactuals (Miller, 2019): if that part of the input were to change, the model's prediction might change as well.
no code implementations • Findings (EMNLP) 2021 • Xi Ye, Qiaochu Chen, Isil Dillig, Greg Durrett
Multimodal program synthesis, which leverages different types of user input to synthesize a desired program, is an attractive way to scale program synthesis to challenging settings; however, it requires integrating noisy signals from the user, like natural language, with hard constraints on the program's behavior.
no code implementations • ACL 2020 • Xi Ye, Qiaochu Chen, Isil Dillig, Greg Durrett
Existing datasets for regular expression (regex) generation from natural language are limited in complexity; compared to regex tasks that users post on StackOverflow, the regexes in these datasets are simple, and the language used to describe them is not diverse.
1 code implementation • 16 Aug 2019 • Xi Ye, Qiaochu Chen, Xinyu Wang, Isil Dillig, Greg Durrett
Our system achieves state-of-the-art performance on the prior datasets and solves 57% of the real-world dataset, which existing neural systems completely fail on.