Search Results for author: Xi Ye

Found 26 papers, 18 papers with code

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

1 code implementation18 Sep 2024 Zayne Sprague, Fangcong Yin, Juan Diego Rodriguez, Dongwei Jiang, Manya Wadhwa, Prasann Singhal, Xinyu Zhao, Xi Ye, Kyle Mahowald, Greg Durrett

Chain-of-thought (CoT) via prompting is the de facto method for eliciting reasoning capabilities from large language models (LLMs).

Math MMLU

CodeUpdateArena: Benchmarking Knowledge Editing on API Updates

no code implementations8 Jul 2024 Zeyu Leo Liu, Shrey Pandit, Xi Ye, Eunsol Choi, Greg Durrett

An instance in our benchmark consists of a synthetic API function update paired with a program synthesis example that uses the updated functionality; our goal is to update an LLM to be able to solve this program synthesis example without providing documentation of the update at inference time.

Benchmarking knowledge editing +1

LoFiT: Localized Fine-tuning on LLM Representations

1 code implementation3 Jun 2024 Fangcong Yin, Xi Ye, Greg Durrett

For truthfulness and reasoning tasks, we find that LoFiT's intervention vectors are more effective for LLM adaptation than vectors from representation intervention methods such as Inference-time Intervention.

parameter-efficient fine-tuning

Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning

no code implementations30 May 2024 Xinlu Zhang, Zhiyu Zoey Chen, Xi Ye, Xianjun Yang, Lichang Chen, William Yang Wang, Linda Ruth Petzold

First, coding data tuning enhances the overall reasoning capabilities of LLMs across different model families and scales.

CityLLaVA: Efficient Fine-Tuning for VLMs in City Scenario

1 code implementation6 May 2024 Zhizhao Duan, Hao Cheng, Duo Xu, Xi Wu, Xiangxie Zhang, Xi Ye, Zhen Xie

In the vast and dynamic landscape of urban settings, Traffic Safety Description and Analysis plays a pivotal role in applications ranging from insurance inspection to accident prevention.

Position Prompt Engineering

AmbigDocs: Reasoning across Documents on Different Entities under the Same Name

no code implementations18 Apr 2024 Yoonsang Lee, Xi Ye, Eunsol Choi

and a set of documents discussing different people named Michael Jordan, can LMs distinguish entity mentions to generate a cohesive answer to the question?

STDiff: Spatio-temporal Diffusion for Continuous Stochastic Video Prediction

1 code implementation11 Dec 2023 Xi Ye, Guillaume-Alexandre Bilodeau

Predicting future frames of a video is challenging because it is difficult to learn the uncertainty of the underlying factors influencing their contents.

Video Prediction

Crafting In-context Examples according to LMs' Parametric Knowledge

1 code implementation16 Nov 2023 Yoonsang Lee, Pranav Atreya, Xi Ye, Eunsol Choi

We perform analysis on three multi-answer question answering datasets, which allows us to further study answer set ordering strategies based on the LM's knowledge of each answer.

Hallucination In-Context Learning +2

Effective Large Language Model Adaptation for Improved Grounding and Citation Generation

no code implementations16 Nov 2023 Xi Ye, Ruoxi Sun, Sercan Ö. Arik, Tomas Pfister

Our framework tunes LLMs to selfground the claims in their responses and provide accurate citations to retrieved documents.

Language Modelling Large Language Model +2

MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning

3 code implementations24 Oct 2023 Zayne Sprague, Xi Ye, Kaj Bostrom, Swarat Chaudhuri, Greg Durrett

We evaluate a range of LLMs and prompting techniques on this dataset and characterize the gaps that remain for techniques like chain-of-thought to perform robust reasoning.

EEL: Efficiently Encoding Lattices for Reranking

1 code implementation1 Jun 2023 Prasann Singhal, Jiacheng Xu, Xi Ye, Greg Durrett

Standard decoding approaches for conditional text generation tasks typically search for an output hypothesis with high model probability, but this may not yield the best hypothesis according to human judgments of quality.

Conditional Text Generation

SatLM: Satisfiability-Aided Language Models Using Declarative Prompting

1 code implementation NeurIPS 2023 Xi Ye, Qiaochu Chen, Isil Dillig, Greg Durrett

In this paper, we propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of LLMs.

Arithmetic Reasoning Language Modelling

Explanation Selection Using Unlabeled Data for Chain-of-Thought Prompting

1 code implementation9 Feb 2023 Xi Ye, Greg Durrett

We first generate sets of candidate explanations for each example in the prompt using a leave-one-out scheme, then find an effective combination of these explanations with a two-stage framework.

Mathematical Reasoning Natural Language Inference +1

Video Prediction by Efficient Transformers

1 code implementation12 Dec 2022 Xi Ye, Guillaume-Alexandre Bilodeau

Video prediction is a challenging computer vision task that has a wide range of applications.

Video Prediction

Complementary Explanations for Effective In-Context Learning

1 code implementation25 Nov 2022 Xi Ye, Srinivasan Iyer, Asli Celikyilmaz, Ves Stoyanov, Greg Durrett, Ramakanth Pasunuru

Large language models (LLMs) have exhibited remarkable capabilities in learning from explanations in prompts, but there has been limited understanding of exactly how these explanations function or why they are effective.

In-Context Learning

Assessing Out-of-Domain Language Model Performance from Few Examples

no code implementations13 Oct 2022 Prasann Singhal, Jarad Forristal, Xi Ye, Greg Durrett

We address the task of predicting out-of-domain (OOD) performance in a few-shot fashion: given a few target-domain examples and a set of models with similar training performance, can we understand how these models will perform on OOD test data?

Language Modelling Natural Language Inference

A unified model for continuous conditional video prediction

1 code implementation11 Oct 2022 Xi Ye, Guillaume-Alexandre Bilodeau

Different conditional video prediction tasks, like video future frame prediction and video frame interpolation, are normally solved by task-related models even though they share many common underlying characteristics.

Video Frame Interpolation Video Prediction

Diagnosing Ensemble Few-Shot Classifiers

no code implementations9 Jun 2022 Weikai Yang, Xi Ye, Xingxing Zhang, Lanxi Xiao, Jiazhi Xia, Zhongyuan Wang, Jun Zhu, Hanspeter Pfister, Shixia Liu

The base learners and labeled samples (shots) in an ensemble few-shot classifier greatly affect the model performance.

The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning

1 code implementation6 May 2022 Xi Ye, Greg Durrett

Does prompting a large language model (LLM) like GPT-3 with explanations improve in-context learning?

In-Context Learning Language Modelling +3

VPTR: Efficient Transformers for Video Prediction

1 code implementation29 Mar 2022 Xi Ye, Guillaume-Alexandre Bilodeau

Based on this new Transformer block, a fully autoregressive video future frames prediction Transformer is proposed.

Video Prediction

Can Explanations Be Useful for Calibrating Black Box Models?

2 code implementations ACL 2022 Xi Ye, Greg Durrett

Our approach first extracts a set of features combining human intuition about the task with model attributions generated by black box interpretation techniques, then uses a simple calibrator, in the form of a classifier, to predict whether the base model was correct or not.

Extractive Question-Answering Few-Shot Learning +2

RnG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering

1 code implementation ACL 2022 Xi Ye, Semih Yavuz, Kazuma Hashimoto, Yingbo Zhou, Caiming Xiong

We present RnG-KBQA, a Rank-and-Generate approach for KBQA, which remedies the coverage issue with a generation model while preserving a strong generalization capability.

Entity Linking Knowledge Base Question Answering +1

Connecting Attributions and QA Model Behavior on Realistic Counterfactuals

1 code implementation EMNLP 2021 Xi Ye, Rohan Nair, Greg Durrett

When a model attribution technique highlights a particular part of the input, a user might understand this highlight as making a statement about counterfactuals (Miller, 2019): if that part of the input were to change, the model's prediction might change as well.

counterfactual Machine Reading Comprehension +1

Optimal Neural Program Synthesis from Multimodal Specifications

no code implementations Findings (EMNLP) 2021 Xi Ye, Qiaochu Chen, Isil Dillig, Greg Durrett

Multimodal program synthesis, which leverages different types of user input to synthesize a desired program, is an attractive way to scale program synthesis to challenging settings; however, it requires integrating noisy signals from the user, like natural language, with hard constraints on the program's behavior.

Program Synthesis valid

Benchmarking Multimodal Regex Synthesis with Complex Structures

no code implementations ACL 2020 Xi Ye, Qiaochu Chen, Isil Dillig, Greg Durrett

Existing datasets for regular expression (regex) generation from natural language are limited in complexity; compared to regex tasks that users post on StackOverflow, the regexes in these datasets are simple, and the language used to describe them is not diverse.

Benchmarking

Sketch-Driven Regular Expression Generation from Natural Language and Examples

1 code implementation16 Aug 2019 Xi Ye, Qiaochu Chen, Xinyu Wang, Isil Dillig, Greg Durrett

Our system achieves state-of-the-art performance on the prior datasets and solves 57% of the real-world dataset, which existing neural systems completely fail on.

Cannot find the paper you are looking for? You can Submit a new open access paper.