Search Results for author: Ben Bogin

Found 22 papers, 17 papers with code

SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories

1 code implementation11 Sep 2024 Ben Bogin, Kejuan Yang, Shashank Gupta, Kyle Richardson, Erin Bransom, Peter Clark, Ashish Sabharwal, Tushar Khot

To advance towards this goal, we introduce SUPER, the first benchmark designed to evaluate the capability of LLMs in setting up and executing tasks from research repositories.

AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?

no code implementations22 Jul 2024 Ori Yoran, Samuel Joseph Amouyal, Chaitanya Malaviya, Ben Bogin, Ofir Press, Jonathan Berant

Language agents, built on top of language models (LMs), are systems that can interact with complex environments, such as the open web.

Leveraging Code to Improve In-context Learning for Semantic Parsing

1 code implementation16 Nov 2023 Ben Bogin, Shivanshu Gupta, Peter Clark, Ashish Sabharwal

In-context learning (ICL) is an appealing approach for semantic parsing due to its few-shot nature and improved generalization.

In-Context Learning Semantic Parsing

Answering Questions by Meta-Reasoning over Multiple Chains of Thought

1 code implementation25 Apr 2023 Ori Yoran, Tomer Wolfson, Ben Bogin, Uri Katz, Daniel Deutch, Jonathan Berant

Modern systems for multi-hop question answering (QA) typically break questions into a sequence of reasoning steps, termed chain-of-thought (CoT), before arriving at a final answer.

Multi-hop Question Answering Question Answering

Diverse Demonstrations Improve In-context Compositional Generalization

1 code implementation13 Dec 2022 Itay Levy, Ben Bogin, Jonathan Berant

In-context learning has shown great success in i. i. d semantic parsing splits, where the training and test sets are drawn from the same distribution.

In-Context Learning Semantic Parsing

Training Vision-Language Models with Less Bimodal Supervision

1 code implementation1 Nov 2022 Elad Segal, Ben Bogin, Jonathan Berant

We experiment with a high-performing vision-language model, and analyze the effect of bimodal supervision on three vision-language tasks.

Language Modelling

Unobserved Local Structures Make Compositional Generalization Hard

1 code implementation15 Jan 2022 Ben Bogin, Shivanshu Gupta, Jonathan Berant

While recent work has convincingly showed that sequence-to-sequence models struggle to generalize to new compositions (termed compositional generalization), little is known on what makes compositional generalization hard on a particular test instance.

Semantic Parsing

COVR: A test-bed for Visually Grounded Compositional Generalization with real images

1 code implementation EMNLP 2021 Ben Bogin, Shivanshu Gupta, Matt Gardner, Jonathan Berant

Due to the automatic generation process, COVR facilitates the creation of compositional splits, where models at test time need to generalize to new concepts and compositions in a zero- or few-shot setting.

Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data

1 code implementation ACL (NLP4Prog) 2021 Moshe Hazoom, Vibhor Malik, Ben Bogin

Most available semantic parsing datasets, comprising of pairs of natural utterances and logical forms, were collected solely for the purpose of training and evaluation of natural language understanding systems.

Natural Language Understanding Text-To-SQL

Evaluating NLP Models via Contrast Sets

no code implementations1 Oct 2020 Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, A. Zhang, Ben Zhou

Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.

Reading Comprehension Sentiment Analysis

Latent Compositional Representations Improve Systematic Generalization in Grounded Question Answering

1 code implementation1 Jul 2020 Ben Bogin, Sanjay Subramanian, Matt Gardner, Jonathan Berant

However, state-of-the-art models in grounded question answering often do not explicitly perform decomposition, leading to difficulties in generalization to out-of-distribution examples.

Inductive Bias Question Answering +1

Obtaining Faithful Interpretations from Compositional Neural Networks

1 code implementation ACL 2020 Sanjay Subramanian, Ben Bogin, Nitish Gupta, Tomer Wolfson, Sameer Singh, Jonathan Berant, Matt Gardner

Neural module networks (NMNs) are a popular approach for modeling compositionality: they achieve high accuracy when applied to problems in language and vision, while reflecting the compositional structure of the problem in the network architecture.

Grammar-based Neural Text-to-SQL Generation

no code implementations30 May 2019 Kevin Lin, Ben Bogin, Mark Neumann, Jonathan Berant, Matt Gardner

The sequence-to-sequence paradigm employed by neural text-to-SQL models typically performs token-level decoding and does not consider generating SQL hierarchically from a grammar.

Text-To-SQL

Representing Schema Structure with Graph Neural Networks for Text-to-SQL Parsing

1 code implementation ACL 2019 Ben Bogin, Matt Gardner, Jonathan Berant

Research on parsing language to SQL has largely ignored the structure of the database (DB) schema, either because the DB was very simple, or because it was observed at both training and test time.

Decoder Graph Neural Network +2

Emergence of Communication in an Interactive World with Consistent Speakers

1 code implementation3 Sep 2018 Ben Bogin, Mor Geva, Jonathan Berant

Training agents to communicate with one another given task-based supervision only has attracted considerable attention recently, due to the growing interest in developing models for human-agent interaction.

Towards an argumentative content search engine using weak supervision

no code implementations COLING 2018 Ran Levy, Ben Bogin, Shai Gretz, Ranit Aharonov, Noam Slonim

Our results clearly indicate that the system is able to successfully generalize from the weak signal, outperforming previously reported results in terms of both precision and coverage.

Argument Mining Decision Making +1

Cannot find the paper you are looking for? You can Submit a new open access paper.