Search Results for author: Alane Suhr

Found 32 papers, 20 papers with code

Crowdsourcing Beyond Annotation: Case Studies in Benchmark Data Collection

no code implementations EMNLP (ACL) 2021 Alane Suhr, Clara Vania, Nikita Nangia, Maarten Sap, Mark Yatskar, Samuel R. Bowman, Yoav Artzi

Even though it is such a fundamental tool in NLP, crowdsourcing use is largely guided by common practices and the personal experience of researchers.

Learning Adaptive Parallel Reasoning with Language Models

1 code implementation21 Apr 2025 Jiayi Pan, Xiuyu Li, Long Lian, Charlie Snell, Yifei Zhou, Adam Yala, Trevor Darrell, Kurt Keutzer, Alane Suhr

Scaling inference-time computation has substantially improved the reasoning capabilities of language models.

4k

TULIP: Towards Unified Language-Image Pretraining

no code implementations19 Mar 2025 Zineng Tang, Long Lian, Seun Eisape, Xudong Wang, Roei Herzig, Adam Yala, Alane Suhr, Trevor Darrell, David M. Chan

These models, by performing language alignment, tend to prioritize high-level semantics over visual understanding, weakening their image understanding.

Contrastive Learning Data Augmentation +2

AutoPresent: Designing Structured Visuals from Scratch

1 code implementation CVPR 2025 Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou, Yi-Hao Peng, Sanjay Subramanian, Qinyue Tan, Maarten Sap, Alane Suhr, Daniel Fried, Graham Neubig, Trevor Darrell

We benchmark end-to-end image generation and program generation methods with a variety of models, and find that programmatic methods produce higher-quality slides in user-interactable formats.

Image Generation

Training Software Engineering Agents and Verifiers with SWE-Gym

2 code implementations30 Dec 2024 Jiayi Pan, Xingyao Wang, Graham Neubig, Navdeep Jaitly, Heng Ji, Alane Suhr, Yizhe Zhang

When combined with our fine-tuned SWE agents, we achieve 32. 0% and 26. 0% on SWE-Bench Verified and Lite, respectively, reflecting a new state-of-the-art for open-weight SWE agents.

Language Modeling Language Modelling

Using Language Models to Disambiguate Lexical Choices in Translation

1 code implementation8 Nov 2024 Josh Barua, Sanjay Subramanian, Kayo Yin, Alane Suhr

In translation, a concept represented by a single word in a source language can have multiple variations in a target language.

Machine Translation Sentence +1

Grounding Language in Multi-Perspective Referential Communication

1 code implementation4 Oct 2024 Zineng Tang, Lingjun Mao, Alane Suhr

We introduce a task and dataset for referring expression generation and comprehension in multi-agent embodied environments.

Referring Expression Referring expression generation

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

1 code implementation14 Jun 2024 Hao Bai, Yifei Zhou, Mert Cemri, Jiayi Pan, Alane Suhr, Sergey Levine, Aviral Kumar

This paper introduces a novel autonomous RL approach, called DigiRL, for training in-the-wild device control agents through fine-tuning a pre-trained VLM in two stages: offline RL to initialize the model, followed by offline-to-online RL.

Offline RL

Autonomous Evaluation and Refinement of Digital Agents

1 code implementation9 Apr 2024 Jiayi Pan, Yichi Zhang, Nicholas Tomlin, Yifei Zhou, Sergey Levine, Alane Suhr

We show that domain-general automatic evaluators can significantly improve the performance of agents for web navigation and device control.

UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations

no code implementations14 Nov 2023 Wenting Zhao, Justin T Chiu, Jena D. Hwang, Faeze Brahman, Jack Hessel, Sanjiban Choudhury, Yejin Choi, Xiang Lorraine Li, Alane Suhr

To instead investigate the ability to model unusual, unexpected, and unlikely situations, we explore the task of uncommonsense abductive reasoning.

Diversity Imitation Learning +1

What's In My Big Data?

1 code implementation31 Oct 2023 Yanai Elazar, Akshita Bhagia, Ian Magnusson, Abhilasha Ravichander, Dustin Schwenk, Alane Suhr, Pete Walsh, Dirk Groeneveld, Luca Soldaini, Sameer Singh, Hanna Hajishirzi, Noah A. Smith, Jesse Dodge

We open-source WIMBD's code and artifacts to provide a standard set of evaluations for new text-based corpora and to encourage more analyses and transparency around them.

Benchmarking

Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

1 code implementation NeurIPS 2023 Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hannaneh Hajishirzi

We introduce Fine-Grained RLHF, a framework that enables training and learning from reward functions that are fine-grained in two respects: (1) density, providing a reward after every segment (e. g., a sentence) is generated; and (2) incorporating multiple reward models associated with different feedback types (e. g., factual incorrectness, irrelevance, and information incompleteness).

Language Modeling Language Modelling +3

Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker

no code implementations1 Jun 2023 Melanie Sclar, Sachin Kumar, Peter West, Alane Suhr, Yejin Choi, Yulia Tsvetkov

We present SymbolicToM, a plug-and-play approach to reason about the belief states of multiple characters in reading comprehension tasks via explicit symbolic representation.

Reading Comprehension

We're Afraid Language Models Aren't Modeling Ambiguity

1 code implementation27 Apr 2023 Alisa Liu, Zhaofeng Wu, Julian Michael, Alane Suhr, Peter West, Alexander Koller, Swabha Swayamdipta, Noah A. Smith, Yejin Choi

We find that the task remains extremely challenging, including for GPT-4, whose generated disambiguations are considered correct only 32% of the time in human evaluation, compared to 90% for disambiguations in our dataset.

Sentence

Continual Learning for Instruction Following from Realtime Feedback

1 code implementation NeurIPS 2023 Alane Suhr, Yoav Artzi

We propose and deploy an approach to continually train an instruction-following agent from feedback provided by users during collaborative interactions.

Continual Learning Instruction Following

Abstract Visual Reasoning with Tangram Shapes

no code implementations29 Nov 2022 Anya Ji, Noriyuki Kojima, Noah Rush, Alane Suhr, Wai Keen Vong, Robert D. Hawkins, Yoav Artzi

We introduce KiloGram, a resource for studying abstract visual reasoning in humans and machines.

Visual Reasoning

Analysis of Language Change in Collaborative Instruction Following

1 code implementation Findings (EMNLP) 2021 Anna Effenberger, Eva Yan, Rhia Singh, Alane Suhr, Yoav Artzi

We analyze language change over time in a collaborative, goal-oriented instructional task, where utility-maximizing participants form conventions and increase their expertise.

Instruction Following

Continual Learning for Grounded Instruction Generation by Observing Human Following Behavior

no code implementations10 Aug 2021 Noriyuki Kojima, Alane Suhr, Yoav Artzi

We study continual learning for natural language instruction generation, by observing human users' instruction execution.

Continual Learning

Exploring Unexplored Generalization Challenges for Cross-Database Semantic Parsing

no code implementations ACL 2020 Alane Suhr, Ming-Wei Chang, Peter Shaw, Kenton Lee

We study the task of cross-database semantic parsing (XSP), where a system that maps natural language utterances to executable SQL queries is evaluated on databases unseen during training.

Semantic Parsing

Executing Instructions in Situated Collaborative Interactions

no code implementations IJCNLP 2019 Alane Suhr, Claudia Yan, Charlotte Schluger, Stanley Yu, Hadi Khader, Marwa Mouallem, Iris Zhang, Yoav Artzi

We study a collaborative scenario where a user not only instructs a system to complete tasks, but also acts alongside it.

NLVR2 Visual Bias Analysis

1 code implementation23 Sep 2019 Alane Suhr, Yoav Artzi

We show that the performance of existing models (Li et al., 2019; Tan and Bansal 2019) is relatively robust to this potential bias.

Sentence

A Corpus for Reasoning About Natural Language Grounded in Photographs

2 code implementations ACL 2019 Alane Suhr, Stephanie Zhou, Ally Zhang, Iris Zhang, Huajun Bai, Yoav Artzi

We crowdsource the data using sets of visually rich images and a compare-and-contrast task to elicit linguistically diverse language.

Diversity Visual Reasoning

Neural Semantic Parsing

no code implementations ACL 2018 Matt Gardner, Pradeep Dasigi, Srinivasan Iyer, Alane Suhr, Luke Zettlemoyer

Semantic parsing, the study of translating natural language utterances into machine-executable programs, is a well-established research area and has applications in question answering, instruction following, voice assistants, and code generation.

Code Generation Instruction Following +4

Situated Mapping of Sequential Instructions to Actions with Single-step Reward Observation

1 code implementation ACL 2018 Alane Suhr, Yoav Artzi

We propose a learning approach for mapping context-dependent sequential instructions to actions.

Learning to Map Context-Dependent Sentences to Executable Formal Queries

1 code implementation NAACL 2018 Alane Suhr, Srinivasan Iyer, Yoav Artzi

We propose a context-dependent model to map utterances within an interaction to executable formal queries.

Visual Reasoning with Natural Language

no code implementations2 Oct 2017 Stephanie Zhou, Alane Suhr, Yoav Artzi

To understand language in complex environments, agents must reason about the full range of language inputs and their correspondence to the world.

Descriptive Diversity +1

A Corpus of Natural Language for Visual Reasoning

no code implementations ACL 2017 Alane Suhr, Mike Lewis, James Yeh, Yoav Artzi

We present a new visual reasoning language dataset, containing 92, 244 pairs of examples of natural statements grounded in synthetic images with 3, 962 unique sentences.

Question Answering Visual Question Answering (VQA) +1

Cannot find the paper you are looking for? You can Submit a new open access paper.