1 code implementation • 3 Feb 2025 • Archiki Prasad, Elias Stengel-Eskin, Justin Chih-Yao Chen, Zaid Khan, Mohit Bansal
Unit tests (UTs) play an instrumental role in assessing code correctness as well as providing feedback to a large language model (LLM) as it iteratively debugs faulty code, motivating automated test generation.
1 code implementation • 18 Oct 2024 • Elias Stengel-Eskin, Peter Hase, Mohit Bansal
PBT consistently improves resistance to misinformation and resilience to being challenged while also resulting in the best overall performance on holistic data containing both positive and negative persuasion.
1 code implementation • 8 Oct 2024 • Zaid Khan, Elias Stengel-Eskin, Jaemin Cho, Mohit Bansal
Students are iteratively trained and evaluated on generated data, and their feedback (in the form of errors or weak skills) is reported to the agent after each iteration.
1 code implementation • 2 Oct 2024 • Duy Nguyen, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal
Our results on commonsense and math reasoning tasks demonstrate that LASeR can boost iterative LLM optimization by optimizing for multiple RMs, improving the absolute average accuracy of Llama-3-8B over three datasets by 2. 67% over training with ensemble RM scores while also showing superior training efficiency (e. g., a 2x speedup).
1 code implementation • 18 Sep 2024 • Justin Chih-Yao Chen, Archiki Prasad, Swarnadeep Saha, Elias Stengel-Eskin, Mohit Bansal
Moreover, to ensure effective refinement, we employ a multi-agent loop with three agents: Solver, Reviewer (which generates targeted feedback based on step-wise RM scores), and the Refiner (which incorporates feedback).
1 code implementation • 11 Sep 2024 • Han Wang, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal
Knowledge conflict arises from discrepancies between information in the context of a large language model (LLM) and the knowledge stored in its parameters.
1 code implementation • 19 Jul 2024 • Swarnadeep Saha, Archiki Prasad, Justin Chih-Yao Chen, Peter Hase, Elias Stengel-Eskin, Mohit Bansal
To this end, we propose the System-1. x Planner, a controllable planning framework with LLMs that is capable of generating hybrid plans and balancing between the two planning modes based on the difficulty of the problem at hand.
no code implementations • 27 Jun 2024 • Peter Hase, Thomas Hofweber, Xiang Zhou, Elias Stengel-Eskin, Mohit Bansal
With this goal in mind, this paper critiques the standard formulation of the model editing problem and proposes a formal testbed for model editing research.
1 code implementation • 17 Jun 2024 • Amith Ananthram, Elias Stengel-Eskin, Carl Vondrick, Mohit Bansal, Kathleen McKeown
Moreover, while prompting in the language of a target culture can lead to reductions in bias, it is not a substitute for building AI more representative of the world's languages.
no code implementations • 5 Jun 2024 • Thomas Hofweber, Peter Hase, Elias Stengel-Eskin, Mohit Bansal
We consider both logical coherence norms as well as coherence norms tied to the strength of belief.
2 code implementations • 31 May 2024 • Elias Stengel-Eskin, Peter Hase, Mohit Bansal
To calibrate both implicit and explicit confidence markers, we introduce a pragmatic, listener-aware finetuning method (LACIE) that models the listener, considering not only whether an answer is right, but whether it will be accepted by a listener.
1 code implementation • 29 May 2024 • Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit Bansal
Specifically, we incorporate multigranularity information into a tree-based representation, allowing VideoTree to extract query-relevant details from long videos in a coarse-to-fine manner.
1 code implementation • 4 May 2024 • Maryam Hashemzadeh, Elias Stengel-Eskin, Sarath Chandar, Marc-Alexandre Cote
While Large Language Models (LLMs) have demonstrated significant promise as agents in interactive tasks, their substantial computational requirements and restricted number of calls constrain their practical utility, especially in long-horizon interactive tasks such as decision-making or in scenarios involving continuous ongoing tasks.
no code implementations • 4 Mar 2024 • David Wan, Jaemin Cho, Elias Stengel-Eskin, Mohit Bansal
Highlighting particularly relevant regions of an image can improve the performance of vision-language models (VLMs) on various vision-language (VL) tasks by guiding the model to attend more closely to these regions of interest.
no code implementations • 26 Feb 2024 • Haotian Fu, Pratyusha Sharma, Elias Stengel-Eskin, George Konidaris, Nicolas Le Roux, Marc-Alexandre Côté, Xingdi Yuan
We present an algorithm for skill discovery from expert demonstrations.
1 code implementation • 20 Feb 2024 • Han Wang, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal
Current "sample and select" methods such as self-consistency (SC) rely on majority voting to score answers.
1 code implementation • 19 Feb 2024 • Jinhao Duan, Renming Zhang, James Diffenderfer, Bhavya Kailkhura, Lichao Sun, Elias Stengel-Eskin, Mohit Bansal, Tianlong Chen, Kaidi Xu
We further characterize the game-theoretic properties of LLMs, such as equilibrium and Pareto Efficiency in repeated games.
1 code implementation • 2 Feb 2024 • Justin Chih-Yao Chen, Swarnadeep Saha, Elias Stengel-Eskin, Mohit Bansal
Experiments on seven widely used commonsense and math reasoning benchmarks show that MAGDi improves the reasoning capabilities of smaller models, outperforming several methods that distill from a single teacher and multiple teachers.
1 code implementation • 29 Jan 2024 • Elias Stengel-Eskin, Archiki Prasad, Mohit Bansal
While large language models (LLMs) are increasingly being used for program synthesis, they lack the global view needed to develop useful abstractions; they generally predict programs one at a time, often repeating the same functionality.
1 code implementation • 9 Oct 2023 • Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal
An increasing number of vision-language tasks can be handled with little to no training, i. e., in a zero and few-shot manner, by marrying large language models (LLMs) to vision encoders, resulting in large vision-language models (LVLMs).
1 code implementation • 1 Jun 2023 • Elias Stengel-Eskin, Kyle Rawlins, Benjamin Van Durme
We attempt to address this shortcoming by introducing AmP, a framework, dataset, and challenge for translating ambiguous natural language to formal representations like logic and code.
no code implementations • 29 Mar 2023 • Elias Stengel-Eskin, Benjamin Van Durme
We then examine how confidence scores can help optimize the trade-off between usability and safety.
2 code implementations • CVPR 2023 • Zhuowan Li, Xingrui Wang, Elias Stengel-Eskin, Adam Kortylewski, Wufei Ma, Benjamin Van Durme, Alan Yuille
Visual Question Answering (VQA) models often perform poorly on out-of-distribution data and struggle on domain generalization.
2 code implementations • 14 Nov 2022 • Elias Stengel-Eskin, Benjamin Van Durme
Sequence generation models are increasingly being used to translate natural language into programs, i. e. to perform executable semantic parsing.
1 code implementation • 14 Nov 2022 • Elias Stengel-Eskin, Jimena Guallar-Blasco, Yi Zhou, Benjamin Van Durme
Natural language is ambiguous.
1 code implementation • 24 May 2022 • Elias Stengel-Eskin, Benjamin Van Durme
Given the advanced fluency of large generative language models, we ask whether model outputs are consistent with these heuristics, and to what degree different models are consistent with each other.
1 code implementation • 24 May 2022 • Elias Stengel-Eskin, Emmanouil Antonios Platanios, Adam Pauls, Sam Thomson, Hao Fang, Benjamin Van Durme, Jason Eisner, Yu Su
Rejecting class imbalance as the sole culprit, we reveal that the trend is closely associated with an effect we call source signal dilution, where strong lexical cues for the new symbol become diluted as the training dataset grows.
1 code implementation • NAACL 2022 • Chenyu Zhang, Benjamin Van Durme, Zhuowan Li, Elias Stengel-Eskin
Our commonsense knowledge about objects includes their typical visual attributes; we know that bananas are typically yellow or green, and not purple.
Ranked #1 on
Visual Commonsense Tests
on ViComTe-color
2 code implementations • Conference On Robot Learning (CoRL) 2021 • Elias Stengel-Eskin, Andrew Hundt, Zhuohong He, Aditya Murali, Nakul Gopalan, Matthew Gombolay, Gregory Hager
Our model completes block manipulation tasks with synthetic commands 530 more often than a UNet-based baseline, and learns to localize actions correctly while creating a mapping of symbols to perceptual input that supports compositional reasoning.
1 code implementation • ICCV 2021 • Zhuowan Li, Elias Stengel-Eskin, Yixiao Zhang, Cihang Xie, Quan Tran, Benjamin Van Durme, Alan Yuille
Our experiments show CCO substantially boosts the performance of neural symbolic methods on real images.
1 code implementation • 12 Apr 2021 • Elias Stengel-Eskin, Kenton Murray, Sheng Zhang, Aaron Steven White, Benjamin Van Durme
While numerous attempts have been made to jointly parse syntax and semantics, high performance in one domain typically comes at the price of performance in the other.
no code implementations • 1 Jul 2020 • Ryan Culkin, J. Edward Hu, Elias Stengel-Eskin, Guanghui Qin, Benjamin Van Durme
We introduce a novel paraphrastic augmentation strategy based on sentence-level lexically constrained paraphrasing and discriminative span alignment.
no code implementations • ACL 2020 • Elias Stengel-Eskin, Aaron Steven White, Sheng Zhang, Benjamin Van Durme
We introduce a transductive model for parsing into Universal Decompositional Semantics (UDS) representations, which jointly learns to map natural language utterances into UDS graph structures and annotate the graph with decompositional semantic attribute scores.
1 code implementation • LREC 2020 • Aaron Steven White, Elias Stengel-Eskin, Siddharth Vashishtha, Venkata Govindarajan, Dee Ann Reisinger, Tim Vieira, Keisuke Sakaguchi, Sheng Zhang, Francis Ferraro, Rachel Rudinger, Kyle Rawlins, Benjamin Van Durme
We present the Universal Decompositional Semantics (UDS) dataset (v1. 0), which is bundled with the Decomp toolkit (v0. 1).
no code implementations • IJCNLP 2019 • Elias Stengel-Eskin, Tzu-Ray Su, Matt Post, Benjamin Van Durme
We introduce a novel discriminative word alignment model, which we integrate into a Transformer-based machine translation model.