Search Results for author: Elias Stengel-Eskin

Found 36 papers, 27 papers with code

Learning to Generate Unit Tests for Automated Debugging

1 code implementation3 Feb 2025 Archiki Prasad, Elias Stengel-Eskin, Justin Chih-Yao Chen, Zaid Khan, Mohit Bansal

Unit tests (UTs) play an instrumental role in assessing code correctness as well as providing feedback to a large language model (LLM) as it iteratively debugs faulty code, motivating automated test generation.

Large Language Model

Teaching Models to Balance Resisting and Accepting Persuasion

1 code implementation18 Oct 2024 Elias Stengel-Eskin, Peter Hase, Mohit Bansal

PBT consistently improves resistance to misinformation and resilience to being challenged while also resulting in the best overall performance on holistic data containing both positive and negative persuasion.

Misinformation

DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback

1 code implementation8 Oct 2024 Zaid Khan, Elias Stengel-Eskin, Jaemin Cho, Mohit Bansal

Students are iteratively trained and evaluated on generated data, and their feedback (in the form of errors or weak skills) is reported to the agent after each iteration.

Math Sequential Decision Making +1

LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits

1 code implementation2 Oct 2024 Duy Nguyen, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal

Our results on commonsense and math reasoning tasks demonstrate that LASeR can boost iterative LLM optimization by optimizing for multiple RMs, improving the absolute average accuracy of Llama-3-8B over three datasets by 2. 67% over training with ensemble RM scores while also showing superior training efficiency (e. g., a 2x speedup).

Instruction Following Math +1

MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning

1 code implementation18 Sep 2024 Justin Chih-Yao Chen, Archiki Prasad, Swarnadeep Saha, Elias Stengel-Eskin, Mohit Bansal

Moreover, to ensure effective refinement, we employ a multi-agent loop with three agents: Solver, Reviewer (which generates targeted feedback based on step-wise RM scores), and the Refiner (which incorporates feedback).

Math

AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge

1 code implementation11 Sep 2024 Han Wang, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal

Knowledge conflict arises from discrepancies between information in the context of a large language model (LLM) and the knowledge stored in its parameters.

Language Modelling Large Language Model +1

System-1.x: Learning to Balance Fast and Slow Planning with Language Models

1 code implementation19 Jul 2024 Swarnadeep Saha, Archiki Prasad, Justin Chih-Yao Chen, Peter Hase, Elias Stengel-Eskin, Mohit Bansal

To this end, we propose the System-1. x Planner, a controllable planning framework with LLMs that is capable of generating hybrid plans and balancing between the two planning modes based on the difficulty of the problem at hand.

Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?

no code implementations27 Jun 2024 Peter Hase, Thomas Hofweber, Xiang Zhou, Elias Stengel-Eskin, Mohit Bansal

With this goal in mind, this paper critiques the standard formulation of the model editing problem and proposes a formal testbed for model editing research.

Model Editing Philosophy

See It from My Perspective: Diagnosing the Western Cultural Bias of Large Vision-Language Models in Image Understanding

1 code implementation17 Jun 2024 Amith Ananthram, Elias Stengel-Eskin, Carl Vondrick, Mohit Bansal, Kathleen McKeown

Moreover, while prompting in the language of a target culture can lead to reductions in bias, it is not a substitute for building AI more representative of the world's languages.

Are language models rational? The case of coherence norms and belief revision

no code implementations5 Jun 2024 Thomas Hofweber, Peter Hase, Elias Stengel-Eskin, Mohit Bansal

We consider both logical coherence norms as well as coherence norms tied to the strength of belief.

LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models

2 code implementations31 May 2024 Elias Stengel-Eskin, Peter Hase, Mohit Bansal

To calibrate both implicit and explicit confidence markers, we introduce a pragmatic, listener-aware finetuning method (LACIE) that models the listener, considering not only whether an answer is right, but whether it will be accepted by a listener.

TriviaQA TruthfulQA

VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos

1 code implementation29 May 2024 Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit Bansal

Specifically, we incorporate multigranularity information into a tree-based representation, allowing VideoTree to extract query-relevant details from long videos in a coarse-to-fine manner.

EgoSchema MME +3

Sub-goal Distillation: A Method to Improve Small Language Agents

1 code implementation4 May 2024 Maryam Hashemzadeh, Elias Stengel-Eskin, Sarath Chandar, Marc-Alexandre Cote

While Large Language Models (LLMs) have demonstrated significant promise as agents in interactive tasks, their substantial computational requirements and restricted number of calls constrain their practical utility, especially in long-horizon interactive tasks such as decision-making or in scenarios involving continuous ongoing tasks.

Imitation Learning Knowledge Distillation +1

Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training

no code implementations4 Mar 2024 David Wan, Jaemin Cho, Elias Stengel-Eskin, Mohit Bansal

Highlighting particularly relevant regions of an image can improve the performance of vision-language models (VLMs) on various vision-language (VL) tasks by guiding the model to attend more closely to these regions of interest.

Math Phrase Grounding +3

Soft Self-Consistency Improves Language Model Agents

1 code implementation20 Feb 2024 Han Wang, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal

Current "sample and select" methods such as self-consistency (SC) rely on majority voting to score answers.

Language Modeling Language Modelling +2

MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models

1 code implementation2 Feb 2024 Justin Chih-Yao Chen, Swarnadeep Saha, Elias Stengel-Eskin, Mohit Bansal

Experiments on seven widely used commonsense and math reasoning benchmarks show that MAGDi improves the reasoning capabilities of smaller models, outperforming several methods that distill from a single teacher and multiple teachers.

Language Modelling Large Language Model +1

ReGAL: Refactoring Programs to Discover Generalizable Abstractions

1 code implementation29 Jan 2024 Elias Stengel-Eskin, Archiki Prasad, Mohit Bansal

While large language models (LLMs) are increasingly being used for program synthesis, they lack the global view needed to develop useful abstractions; they generally predict programs one at a time, often repeating the same functionality.

Date Understanding Math +2

Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models

1 code implementation9 Oct 2023 Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal

An increasing number of vision-language tasks can be handled with little to no training, i. e., in a zero and few-shot manner, by marrying large language models (LLMs) to vision encoders, resulting in large vision-language models (LVLMs).

Language Modelling Question Answering +2

Zero and Few-shot Semantic Parsing with Ambiguous Inputs

1 code implementation1 Jun 2023 Elias Stengel-Eskin, Kyle Rawlins, Benjamin Van Durme

We attempt to address this shortcoming by introducing AmP, a framework, dataset, and challenge for translating ambiguous natural language to formal representations like logic and code.

Semantic Parsing

Did You Mean...? Confidence-based Trade-offs in Semantic Parsing

no code implementations29 Mar 2023 Elias Stengel-Eskin, Benjamin Van Durme

We then examine how confidence scores can help optimize the trade-off between usability and safety.

Semantic Parsing

Calibrated Interpretation: Confidence Estimation in Semantic Parsing

2 code implementations14 Nov 2022 Elias Stengel-Eskin, Benjamin Van Durme

Sequence generation models are increasingly being used to translate natural language into programs, i. e. to perform executable semantic parsing.

Semantic Parsing

The Curious Case of Control

1 code implementation24 May 2022 Elias Stengel-Eskin, Benjamin Van Durme

Given the advanced fluency of large generative language models, we ask whether model outputs are consistent with these heuristics, and to what degree different models are consistent with each other.

When More Data Hurts: A Troubling Quirk in Developing Broad-Coverage Natural Language Understanding Systems

1 code implementation24 May 2022 Elias Stengel-Eskin, Emmanouil Antonios Platanios, Adam Pauls, Sam Thomson, Hao Fang, Benjamin Van Durme, Jason Eisner, Yu Su

Rejecting class imbalance as the sole culprit, we reveal that the trend is closely associated with an effect we call source signal dilution, where strong lexical cues for the new symbol become diluted as the training dataset grows.

Intent Recognition Natural Language Understanding +1

Visual Commonsense in Pretrained Unimodal and Multimodal Models

1 code implementation NAACL 2022 Chenyu Zhang, Benjamin Van Durme, Zhuowan Li, Elias Stengel-Eskin

Our commonsense knowledge about objects includes their typical visual attributes; we know that bananas are typically yellow or green, and not purple.

Attribute Visual Commonsense Tests +1

Guiding Multi-Step Rearrangement Tasks with Natural Language Instructions

2 code implementations Conference On Robot Learning (CoRL) 2021 Elias Stengel-Eskin, Andrew Hundt, Zhuohong He, Aditya Murali, Nakul Gopalan, Matthew Gombolay, Gregory Hager

Our model completes block manipulation tasks with synthetic commands 530 more often than a UNet-based baseline, and learns to localize actions correctly while creating a mapping of symbols to perceptual input that supports compositional reasoning.

Instruction Following

Joint Universal Syntactic and Semantic Parsing

1 code implementation12 Apr 2021 Elias Stengel-Eskin, Kenton Murray, Sheng Zhang, Aaron Steven White, Benjamin Van Durme

While numerous attempts have been made to jointly parse syntax and semantics, high performance in one domain typically comes at the price of performance in the other.

Semantic Parsing

Iterative Paraphrastic Augmentation with Discriminative Span Alignment

no code implementations1 Jul 2020 Ryan Culkin, J. Edward Hu, Elias Stengel-Eskin, Guanghui Qin, Benjamin Van Durme

We introduce a novel paraphrastic augmentation strategy based on sentence-level lexically constrained paraphrasing and discriminative span alignment.

Sentence

Universal Decompositional Semantic Parsing

no code implementations ACL 2020 Elias Stengel-Eskin, Aaron Steven White, Sheng Zhang, Benjamin Van Durme

We introduce a transductive model for parsing into Universal Decompositional Semantics (UDS) representations, which jointly learns to map natural language utterances into UDS graph structures and annotate the graph with decompositional semantic attribute scores.

Attribute Prediction +1

A Discriminative Neural Model for Cross-Lingual Word Alignment

no code implementations IJCNLP 2019 Elias Stengel-Eskin, Tzu-Ray Su, Matt Post, Benjamin Van Durme

We introduce a novel discriminative word alignment model, which we integrate into a Transformer-based machine translation model.

Machine Translation NER +2

Cannot find the paper you are looking for? You can Submit a new open access paper.