no code implementations • ACL 2022 • Qiang Ning, Ben Zhou, Hao Wu, Haoruo Peng, Chuchu Fan, Matt Gardner
News events are often associated with quantities (e. g., the number of COVID-19 patients or the number of arrests in a protest), and it is often important to extract their type, time, and location from unstructured text in order to analyze these quantity events.
no code implementations • 24 May 2023 • Shivanshu Gupta, Sameer Singh, Matt Gardner
In-context learning (ICL), the ability of large language models to perform novel tasks by conditioning on a prompt with a few task examples, requires demonstrations that are informative about the test instance.
no code implementations • 8 Dec 2022 • Dheeru Dua, Shivanshu Gupta, Sameer Singh, Matt Gardner
The intermediate supervision is typically manually written, which can be expensive to collect.
1 code implementation • 1 Nov 2022 • Abhilasha Ravichander, Matt Gardner, Ana Marasović
We also have workers make three kinds of edits to the passage -- paraphrasing the negated statement, changing the scope of the negation, and reversing the negation -- resulting in clusters of question-answer pairs that are difficult for models to answer with spurious shortcuts.
1 code implementation • ACL 2022 • Orion Weller, Kevin Seppi, Matt Gardner
We find that there is a simple heuristic for when to use one of these techniques over the other: pairwise MTL is better than STILTs when the target task has fewer instances than the supporting task and vice versa.
1 code implementation • ACL 2022 • Sanjay Subramanian, William Merrill, Trevor Darrell, Matt Gardner, Sameer Singh, Anna Rohrbach
Training a referring expression comprehension (ReC) model for a new visual domain requires collecting referring expressions, and potentially corresponding bounding boxes, for images in the domain.
1 code implementation • ACL 2022 • Yuxiang Wu, Matt Gardner, Pontus Stenetorp, Pradeep Dasigi
We propose to tackle this problem by generating a debiased version of a dataset, which can then be used to train a debiased, off-the-shelf model, by simply replacing its training data.
Ranked #1 on
Natural Language Inference
on HANS
1 code implementation • 16 Mar 2022 • Shivanshu Gupta, Sameer Singh, Matt Gardner
A growing body of research has demonstrated the inability of NLP models to generalize compositionally and has tried to alleviate it through specialized architectures, training schemes, and data augmentation, among other approaches.
no code implementations • 15 Feb 2022 • Yasaman Razeghi, Robert L. Logan IV, Matt Gardner, Sameer Singh
Pretrained Language Models (LMs) have demonstrated ability to perform numerical reasoning by extrapolating from a few examples in few-shot settings.
1 code implementation • NAACL 2022 • Akari Asai, Matt Gardner, Hannaneh Hajishirzi
We introduce a multi-task learning framework to jointly generate the final output and predict the evidentiality of each passage, leveraging a new task-agnostic method to obtain silver evidentiality labels for supervision.
1 code implementation • EMNLP 2021 • Ben Bogin, Shivanshu Gupta, Matt Gardner, Jonathan Berant
Due to the automatic generation process, COVR facilitates the creation of compositional splits, where models at test time need to generalize to new concepts and compositions in a zero- or few-shot setting.
no code implementations • 27 Jul 2021 • Anna Rogers, Matt Gardner, Isabelle Augenstein
Alongside huge volumes of research on deep learning models in NLP in the recent years, there has been also much work on benchmark datasets needed to track modeling progress.
1 code implementation • ACL 2022 • Alexis Ross, Tongshuang Wu, Hao Peng, Matthew E. Peters, Matt Gardner
We craft a set of operations to modify the control codes, which in turn steer generation towards targeted attributes.
1 code implementation • ACL 2021 • Nitish Gupta, Sameer Singh, Matt Gardner
The predominant challenge in weakly supervised semantic parsing is that of spurious programs that evaluate to correct answers for the wrong reasons.
1 code implementation • NAACL 2021 • Pradeep Dasigi, Kyle Lo, Iz Beltagy, Arman Cohan, Noah A. Smith, Matt Gardner
Readers of academic research papers often read with the goal of answering specific questions.
Ranked #1 on
Question Answering
on QASPER
no code implementations • EMNLP 2021 • Dheeru Dua, Pradeep Dasigi, Sameer Singh, Matt Gardner
When training most modern reading comprehension models, all the questions associated with a context are treated as being independent from each other.
no code implementations • EMNLP 2021 • Jesse Dodge, Maarten Sap, Ana Marasović, William Agnew, Gabriel Ilharco, Dirk Groeneveld, Margaret Mitchell, Matt Gardner
Finally, we conclude with some recommendations for how to created and document web-scale datasets from a scrape of the internet.
no code implementations • EMNLP 2021 • Dheeru Dua, Cicero Nogueira dos santos, Patrick Ng, Ben Athiwaratkun, Bing Xiang, Matt Gardner, Sameer Singh
Compositional reasoning tasks like multi-hop question answering, require making latent decisions to get the final answer, given a question.
no code implementations • EMNLP 2021 • Matt Gardner, William Merrill, Jesse Dodge, Matthew E. Peters, Alexis Ross, Sameer Singh, Noah A. Smith
In this work we argue that for complex language understanding tasks, all simple feature correlations are spurious, and we formalize this notion into a class of problems which we call competency problems.
no code implementations • EMNLP 2021 • Nitish Gupta, Sameer Singh, Matt Gardner, Dan Roth
Such an objective does not require external supervision for the values of the latent output, or even the end task, yet provides an additional training signal to that provided by individual training examples themselves.
1 code implementation • EMNLP 2021 • Ansong Ni, Matt Gardner, Pradeep Dasigi
We also show that retrieval marginalization results in 4. 1 QA F1 improvement over a non-marginalized baseline on HotpotQA in the fullwiki setting.
1 code implementation • EMNLP 2020 • Orion Weller, Nicholas Lourie, Matt Gardner, Matthew E. Peters
Typically, machine learning systems solve new tasks by training on thousands of examples.
no code implementations • EMNLP 2020 • James Ferguson, Matt Gardner, Hannaneh Hajishirzi, Tushar Khot, Pradeep Dasigi
However, most existing reading comprehension (RC) tasks only focus on questions for which the contexts provide all the information required to answer them, thus not evaluating a system's performance at identifying a potential lack of sufficient information and locating sources for that information.
no code implementations • EMNLP 2020 • Eric Wallace, Matt Gardner, Sameer Singh
Although neural NLP models are highly expressive and empirically successful, they also systematically fail in counterintuitive ways and are opaque in their decision-making process.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Sanjay Subramanian, Lucy Lu Wang, Sachin Mehta, Ben Bogin, Madeleine van Zuylen, Sravanthi Parasa, Sameer Singh, Matt Gardner, Hannaneh Hajishirzi
To address challenges in figure retrieval and figure-to-text alignment, we introduce MedICaT, a dataset of medical images in context.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Inbar Oren, Jonathan Herzig, Nitish Gupta, Matt Gardner, Jonathan Berant
Generalization of models to out-of-distribution (OOD) data has captured tremendous attention recently.
1 code implementation • EMNLP 2020 • Anthony Chen, Gabriel Stanovsky, Sameer Singh, Matt Gardner
Posing reading comprehension as a generation problem provides a great deal of flexibility, allowing for open-ended questions with few restrictions on possible answers.
no code implementations • 1 Oct 2020 • Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, A. Zhang, Ben Zhou
Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.
no code implementations • EMNLP 2020 • Qiang Ning, Hao Wu, Pradeep Dasigi, Dheeru Dua, Matt Gardner, Robert L. Logan IV, Ana Marasovi{\'c}, Zhen Nie
High-quality and large-scale data are key to success for AI systems.
no code implementations • CRAC (ACL) 2021 • Zhaofeng Wu, Matt Gardner
Despite significant recent progress in coreference resolution, the quality of current state-of-the-art systems still considerably trails behind human-level performance.
no code implementations • ACL 2020 • Robert L. Logan IV, Matt Gardner, Sameer Singh
In addition, we elucidate subtle differences in how importance sampling is applied in these works that can have substantial effects on the final estimates, as well as provide theoretical results which reinforce the validity of this technique.
no code implementations • ACL 2020 • Ananth Gottumukkala, Dheeru Dua, Sameer Singh, Matt Gardner
Building general reading comprehension systems, capable of solving multiple datasets at the same time, is a recent aspirational goal in the research community.
1 code implementation • 1 Jul 2020 • Ben Bogin, Sanjay Subramanian, Matt Gardner, Jonathan Berant
However, state-of-the-art models in grounded question answering often do not explicitly perform decomposition, leading to difficulties in generalization to out-of-distribution examples.
no code implementations • ACL 2020 • Dheeru Dua, Sameer Singh, Matt Gardner
Complex compositional reading comprehension datasets require performing latent sequential decisions that are learned via supervision from the final answer.
1 code implementation • ACL 2020 • Sanjay Subramanian, Ben Bogin, Nitish Gupta, Tomer Wolfson, Sameer Singh, Jonathan Berant, Matt Gardner
Neural module networks (NMNs) are a popular approach for modeling compositionality: they achieve high accuracy when applied to problems in language and vision, while reflecting the compositional structure of the problem in the network architecture.
no code implementations • EMNLP 2020 • Qiang Ning, Hao Wu, Rujun Han, Nanyun Peng, Matt Gardner, Dan Roth
A critical part of reading is being able to understand the temporal relationships between events described in a passage of text, even when those relationships are not explicitly stated.
Ranked #2 on
Question Answering
on Torque
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, Ally Zhang, Ben Zhou
Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.
no code implementations • EMNLP 2020 • Jiangming Liu, Matt Gardner, Shay B. Cohen, Mirella Lapata
Complex reasoning over text requires understanding and chaining together free-form predicates and logical connectives.
4 code implementations • TACL 2020 • Tomer Wolfson, Mor Geva, Ankit Gupta, Matt Gardner, Yoav Goldberg, Daniel Deutch, Jonathan Berant
Understanding natural language questions entails the ability to break down a question into the requisite steps for computing its answer.
no code implementations • 29 Dec 2019 • Dheeru Dua, Ananth Gottumukkala, Alon Talmor, Sameer Singh, Matt Gardner
A lot of diverse reading comprehension datasets have recently been introduced to study various phenomena in natural language, ranging from simple paraphrase matching and entity typing to entity tracking and understanding the implications of the context.
2 code implementations • ICLR 2020 • Nitish Gupta, Kevin Lin, Dan Roth, Sameer Singh, Matt Gardner
Answering compositional questions that require multiple steps of reasoning against text is challenging, especially when they involve discrete, symbolic operations.
no code implementations • WS 2019 • Anthony Chen, Gabriel Stanovsky, Sameer Singh, Matt Gardner
Our study suggests that while current metrics may be suitable for existing QA datasets, they limit the complexity of QA datasets that can be created.
no code implementations • WS 2019 • Matt Gardner, Jonathan Berant, Hannaneh Hajishirzi, Alon Talmor, Sewon Min
In this work, we justify a question answering approach to reading comprehension and describe the various kinds of questions one might use to more fully test a system{'}s comprehension of a passage, moving beyond questions that only probe local predicate-argument structures.
no code implementations • WS 2019 • Dheeru Dua, Ananth Gottumukkala, Alon Talmor, Sameer Singh, Matt Gardner
A lot of diverse reading comprehension datasets have recently been introduced to study various phenomena in natural language, ranging from simple paraphrase matching and entity typing to entity tracking and understanding the implications of the context.
no code implementations • 25 Sep 2019 • Matt Gardner, Jonathan Berant, Hannaneh Hajishirzi, Alon Talmor, Sewon Min
In this opinion piece, we argue that question answering should be considered a format which is sometimes useful for studying particular phenomena, not a phenomenon or task in itself.
1 code implementation • IJCNLP 2019 • Eric Wallace, Jens Tuyls, Junlin Wang, Sanjay Subramanian, Matt Gardner, Sameer Singh
Neural NLP models are increasingly accurate but are imperfect and opaque---they break in counterintuitive ways and leave end users puzzled at their behavior.
1 code implementation • IJCNLP 2019 • Eric Wallace, Yizhong Wang, Sujian Li, Sameer Singh, Matt Gardner
The ability to understand and work with numbers (numeracy) is critical for many complex reasoning tasks.
no code implementations • IJCNLP 2019 • Oyvind Tafjord, Matt Gardner, Kevin Lin, Peter Clark
QuaRTz contains general qualitative statements, e. g., "A sunscreen with a higher SPF protects the skin longer.
1 code implementation • IJCNLP 2019 • Ben Bogin, Matt Gardner, Jonathan Berant
State-of-the-art semantic parsers rely on auto-regressive decoding, emitting one symbol at a time.
1 code implementation • IJCNLP 2019 • Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, Sameer Singh
We define universal adversarial triggers: input-agnostic sequences of tokens that trigger a model to produce a specific prediction when concatenated to any input from a dataset.
1 code implementation • IJCNLP 2019 • Pradeep Dasigi, Nelson F. Liu, Ana Marasović, Noah A. Smith, Matt Gardner
Machine comprehension of texts longer than a single sentence often requires coreference resolution.
no code implementations • WS 2019 • Kevin Lin, Oyvind Tafjord, Peter Clark, Matt Gardner
A system is presented a background passage containing at least one of these relations, a novel situation that uses this background, and questions that require reasoning about effects of the relationships in the background passage in the context of the situation.
1 code implementation • ACL 2019 • Robert Logan, Nelson F. Liu, Matthew E. Peters, Matt Gardner, Sameer Singh
Modeling human language requires the ability to not only generate fluent text but also encode factual knowledge.
1 code implementation • 17 Jun 2019 • Robert L. Logan IV, Nelson F. Liu, Matthew E. Peters, Matt Gardner, Sameer Singh
Modeling human language requires the ability to not only generate fluent text but also encode factual knowledge.
1 code implementation • ACL 2019 • Sewon Min, Eric Wallace, Sameer Singh, Matt Gardner, Hannaneh Hajishirzi, Luke Zettlemoyer
Multi-hop reading comprehension (RC) questions are challenging because they require reading and reasoning over multiple paragraphs.
no code implementations • NAACL 2019 • Pradeep Dasigi, Matt Gardner, Shikhar Murty, Luke Zettlemoyer, Eduard Hovy
Training semantic parsers from question-answer pairs typically involves searching over an exponentially large space of logical forms, and an unguided search can easily be misled by spurious logical forms that coincidentally evaluate to the correct answer.
no code implementations • 30 May 2019 • Kevin Lin, Ben Bogin, Mark Neumann, Jonathan Berant, Matt Gardner
The sequence-to-sequence paradigm employed by neural text-to-SQL models typically performs token-level decoding and does not consider generating SQL hierarchically from a grammar.
1 code implementation • ACL 2019 • Ben Bogin, Matt Gardner, Jonathan Berant
Research on parsing language to SQL has largely ignored the structure of the database (DB) schema, either because the DB was very simple, or because it was observed at both training and test time.
no code implementations • NAACL 2019 • Nelson F. Liu, Matt Gardner, Yonatan Belinkov, Matthew E. Peters, Noah A. Smith
Contextual word representations derived from large-scale neural language models are successful across a diverse set of NLP tasks, suggesting that they encode useful and transferable features of language.
3 code implementations • NAACL 2019 • Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, Matt Gardner
We introduce a new English reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs.
Ranked #12 on
Question Answering
on DROP Test
no code implementations • 20 Nov 2018 • Oyvind Tafjord, Peter Clark, Matt Gardner, Wen-tau Yih, Ashish Sabharwal
Many natural language questions require recognizing and reasoning with qualitative relationships (e. g., in science, economics, and medicine), but are challenging to answer with corpus-based methods.
no code implementations • EMNLP 2018 • Yang Liu, Matt Gardner, Mirella Lapata
We evaluate this model on two tasks, natural entailment detection and answer sentence selection, and find that modeling latent tree structures results in superior performance.
no code implementations • ACL 2018 • Matt Gardner, Pradeep Dasigi, Srinivasan Iyer, Alane Suhr, Luke Zettlemoyer
Semantic parsing, the study of translating natural language utterances into machine-executable programs, is a well-established research area and has applications in question answering, instruction following, voice assistants, and code generation.
1 code implementation • WS 2018 • Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson Liu, Matthew Peters, Michael Schmitz, Luke Zettlemoyer
This paper describes AllenNLP, a platform for research on deep learning methods in natural language understanding.
46 code implementations • NAACL 2018 • Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer
We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e. g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i. e., to model polysemy).
Ranked #3 on
Citation Intent Classification
on ACL-ARC
(using extra training data)
Citation Intent Classification
Conversational Response Selection
+7
no code implementations • ICLR 2018 • Yang Liu, Matt Gardner
Using a structured attention mechanism, our model matches possible spans in the first sentence to possible spans in the second sentence, simultaneously discovering the tree structure of each sentence and performing a comparison, in a model that is fully differentiable and is trained only on the comparison objective.
1 code implementation • ACL 2018 • Christopher Clark, Matt Gardner
We consider the problem of adapting neural paragraph-level question answering models to the case where entire documents are given as input.
Ranked #22 on
Question Answering
on TriviaQA
1 code implementation • EMNLP 2017 • Jayant Krishnamurthy, Pradeep Dasigi, Matt Gardner
We present a new semantic parsing model for answering compositional questions on semi-structured Wikipedia tables.
no code implementations • WS 2017 • Johannes Welbl, Nelson F. Liu, Matt Gardner
With this method we have assembled SciQ, a dataset of 13. 7K multiple choice science exam questions (Dataset available at http://allenai. org/data. html).
1 code implementation • 12 Jul 2016 • Matt Gardner, Jayant Krishnamurthy
However, all prior approaches to open vocabulary semantic parsing replace a formal KB with textual information, making no use of the KB in their models.