no code implementations • EMNLP (ACL) 2021 • Alane Suhr, Clara Vania, Nikita Nangia, Maarten Sap, Mark Yatskar, Samuel R. Bowman, Yoav Artzi
Even though it is such a fundamental tool in NLP, crowdsourcing use is largely guided by common practices and the personal experience of researchers.
1 code implementation • 21 Apr 2025 • Jiayi Pan, Xiuyu Li, Long Lian, Charlie Snell, Yifei Zhou, Adam Yala, Trevor Darrell, Kurt Keutzer, Alane Suhr
Scaling inference-time computation has substantially improved the reasoning capabilities of language models.
no code implementations • 19 Mar 2025 • Zineng Tang, Long Lian, Seun Eisape, Xudong Wang, Roei Herzig, Adam Yala, Alane Suhr, Trevor Darrell, David M. Chan
These models, by performing language alignment, tend to prioritize high-level semantics over visual understanding, weakening their image understanding.
1 code implementation • CVPR 2025 • Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou, Yi-Hao Peng, Sanjay Subramanian, Qinyue Tan, Maarten Sap, Alane Suhr, Daniel Fried, Graham Neubig, Trevor Darrell
We benchmark end-to-end image generation and program generation methods with a variety of models, and find that programmatic methods produce higher-quality slides in user-interactable formats.
2 code implementations • 30 Dec 2024 • Jiayi Pan, Xingyao Wang, Graham Neubig, Navdeep Jaitly, Heng Ji, Alane Suhr, Yizhe Zhang
When combined with our fine-tuned SWE agents, we achieve 32. 0% and 26. 0% on SWE-Bench Verified and Lite, respectively, reflecting a new state-of-the-art for open-weight SWE agents.
1 code implementation • CVPR 2025 • Lingjun Mao, Zineng Tang, Alane Suhr
We study the perception of color illusions by vision-language models.
1 code implementation • 8 Nov 2024 • Josh Barua, Sanjay Subramanian, Kayo Yin, Alane Suhr
In translation, a concept represented by a single word in a source language can have multiple variations in a target language.
1 code implementation • 4 Oct 2024 • Zineng Tang, Lingjun Mao, Alane Suhr
We introduce a task and dataset for referring expression generation and comprehension in multi-agent embodied environments.
1 code implementation • 14 Jun 2024 • Hao Bai, Yifei Zhou, Mert Cemri, Jiayi Pan, Alane Suhr, Sergey Levine, Aviral Kumar
This paper introduces a novel autonomous RL approach, called DigiRL, for training in-the-wild device control agents through fine-tuning a pre-trained VLM in two stages: offline RL to initialize the model, followed by offline-to-online RL.
no code implementations • 16 May 2024 • Yuexiang Zhai, Hao Bai, Zipeng Lin, Jiayi Pan, Shengbang Tong, Yifei Zhou, Alane Suhr, Saining Xie, Yann Lecun, Yi Ma, Sergey Levine
Finally, our framework uses these task rewards to fine-tune the entire VLM with RL.
1 code implementation • 9 Apr 2024 • Jiayi Pan, Yichi Zhang, Nicholas Tomlin, Yifei Zhou, Sergey Levine, Alane Suhr
We show that domain-general automatic evaluators can significantly improve the performance of agents for web navigation and device control.
no code implementations • 14 Nov 2023 • Wenting Zhao, Justin T Chiu, Jena D. Hwang, Faeze Brahman, Jack Hessel, Sanjiban Choudhury, Yejin Choi, Xiang Lorraine Li, Alane Suhr
To instead investigate the ability to model unusual, unexpected, and unlikely situations, we explore the task of uncommonsense abductive reasoning.
1 code implementation • 31 Oct 2023 • Yanai Elazar, Akshita Bhagia, Ian Magnusson, Abhilasha Ravichander, Dustin Schwenk, Alane Suhr, Pete Walsh, Dirk Groeneveld, Luca Soldaini, Sameer Singh, Hanna Hajishirzi, Noah A. Smith, Jesse Dodge
We open-source WIMBD's code and artifacts to provide a standard set of evaluations for new text-based corpora and to encourage more analyses and transparency around them.
1 code implementation • 17 Oct 2023 • Melanie Sclar, Yejin Choi, Yulia Tsvetkov, Alane Suhr
In this work, we focus on LLM sensitivity to a quintessential class of meaning-preserving design choices: prompt formatting.
1 code implementation • NeurIPS 2023 • Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hannaneh Hajishirzi
We introduce Fine-Grained RLHF, a framework that enables training and learning from reward functions that are fine-grained in two respects: (1) density, providing a reward after every segment (e. g., a sentence) is generated; and (2) incorporating multiple reward models associated with different feedback types (e. g., factual incorrectness, irrelevance, and information incompleteness).
no code implementations • 1 Jun 2023 • Melanie Sclar, Sachin Kumar, Peter West, Alane Suhr, Yejin Choi, Yulia Tsvetkov
We present SymbolicToM, a plug-and-play approach to reason about the belief states of multiple characters in reading comprehension tasks via explicit symbolic representation.
1 code implementation • 27 Apr 2023 • Alisa Liu, Zhaofeng Wu, Julian Michael, Alane Suhr, Peter West, Alexander Koller, Swabha Swayamdipta, Noah A. Smith, Yejin Choi
We find that the task remains extremely challenging, including for GPT-4, whose generated disambiguations are considered correct only 32% of the time in human evaluation, compared to 90% for disambiguations in our dataset.
1 code implementation • 28 Jan 2023 • Kolby Nottingham, Prithviraj Ammanabrolu, Alane Suhr, Yejin Choi, Hannaneh Hajishirzi, Sameer Singh, Roy Fox
Reinforcement learning (RL) agents typically learn tabula rasa, without prior knowledge of the world.
1 code implementation • NeurIPS 2023 • Alane Suhr, Yoav Artzi
We propose and deploy an approach to continually train an instruction-following agent from feedback provided by users during collaborative interactions.
no code implementations • 29 Nov 2022 • Anya Ji, Noriyuki Kojima, Noah Rush, Alane Suhr, Wai Keen Vong, Robert D. Hawkins, Yoav Artzi
We introduce KiloGram, a resource for studying abstract visual reasoning in humans and machines.
1 code implementation • Findings (EMNLP) 2021 • Anna Effenberger, Eva Yan, Rhia Singh, Alane Suhr, Yoav Artzi
We analyze language change over time in a collaborative, goal-oriented instructional task, where utility-maximizing participants form conventions and increase their expertise.
no code implementations • 10 Aug 2021 • Noriyuki Kojima, Alane Suhr, Yoav Artzi
We study continual learning for natural language instruction generation, by observing human users' instruction execution.
no code implementations • ACL 2020 • Alane Suhr, Ming-Wei Chang, Peter Shaw, Kenton Lee
We study the task of cross-database semantic parsing (XSP), where a system that maps natural language utterances to executable SQL queries is evaluated on databases unseen during training.
no code implementations • IJCNLP 2019 • Alane Suhr, Claudia Yan, Charlotte Schluger, Stanley Yu, Hadi Khader, Marwa Mouallem, Iris Zhang, Yoav Artzi
We study a collaborative scenario where a user not only instructs a system to complete tasks, but also acts alongside it.
1 code implementation • 23 Sep 2019 • Alane Suhr, Yoav Artzi
We show that the performance of existing models (Li et al., 2019; Tan and Bansal 2019) is relatively robust to this potential bias.
4 code implementations • CVPR 2019 • Howard Chen, Alane Suhr, Dipendra Misra, Noah Snavely, Yoav Artzi
We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task.
Ranked #11 on
Vision and Language Navigation
on Touchdown Dataset
2 code implementations • ACL 2019 • Alane Suhr, Stephanie Zhou, Ally Zhang, Iris Zhang, Huajun Bai, Yoav Artzi
We crowdsource the data using sets of visually rich images and a compare-and-contrast task to elicit linguistically diverse language.
no code implementations • ACL 2018 • Matt Gardner, Pradeep Dasigi, Srinivasan Iyer, Alane Suhr, Luke Zettlemoyer
Semantic parsing, the study of translating natural language utterances into machine-executable programs, is a well-established research area and has applications in question answering, instruction following, voice assistants, and code generation.
1 code implementation • ACL 2018 • Alane Suhr, Yoav Artzi
We propose a learning approach for mapping context-dependent sequential instructions to actions.
1 code implementation • NAACL 2018 • Alane Suhr, Srinivasan Iyer, Yoav Artzi
We propose a context-dependent model to map utterances within an interaction to executable formal queries.
no code implementations • 2 Oct 2017 • Stephanie Zhou, Alane Suhr, Yoav Artzi
To understand language in complex environments, agents must reason about the full range of language inputs and their correspondence to the world.
no code implementations • ACL 2017 • Alane Suhr, Mike Lewis, James Yeh, Yoav Artzi
We present a new visual reasoning language dataset, containing 92, 244 pairs of examples of natural statements grounded in synthetic images with 3, 962 unique sentences.