1 code implementation • EMNLP 2021 • Aman Madaan, Niket Tandon, Dheeraj Rajagopal, Peter Clark, Yiming Yang, Eduard Hovy
Defeasible reasoning is the mode of reasoning where conclusions can be overturned by taking into account new evidence.
no code implementations • Findings (EMNLP) 2021 • Keisuke Sakaguchi, Chandra Bhagavatula, Ronan Le Bras, Niket Tandon, Peter Clark, Yejin Choi
Scripts – prototypical event sequences describing everyday activities – have been shown to help understand narratives by providing expectations, resolving ambiguity, and filling in unstated information.
no code implementations • 24 May 2023 • Sarah Wiegreffe, Matthew Finlayson, Oyvind Tafjord, Peter Clark, Ashish Sabharwal
For example, encouraging models to generate a valid answer choice can, in fact, be detrimental to task performance for some LMs, and prior probability normalization methods are less effective (sometimes even detrimental) to instruction-tuned LMs.
no code implementations • 23 May 2023 • Wenhao Yu, Meng Jiang, Peter Clark, Ashish Sabharwal
Although counterfactual reasoning is a fundamental aspect of intelligence, the lack of large-scale counterfactual open-domain question-answering (QA) benchmarks makes it difficult to evaluate and improve models on this ability.
no code implementations • 23 May 2023 • Nora Kassner, Oyvind Tafjord, Ashish Sabharwal, Kyle Richardson, Hinrich Schutze, Peter Clark
Our goal is to uncover such dependencies and reduce inconsistencies among them, so that answers are supported by faithful, system-believed chains of reasoning drawn from a consistent network of beliefs.
no code implementations • 22 May 2023 • Zhenwen Liang, Wenhao Yu, Tanmay Rajpurohit, Peter Clark, Xiangliang Zhang, Ashwin Kaylan
In this paper, we present a novel approach for distilling math word problem solving capabilities from large language models (LLMs) into smaller, more efficient student models.
1 code implementation • 15 May 2023 • Afra Feyza Akyürek, Ekin Akyürek, Aman Madaan, Ashwin Kalyan, Peter Clark, Derry Wijaya, Niket Tandon
Despite their unprecedented success, even the largest language models make mistakes.
1 code implementation • 30 Mar 2023 • Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, Peter Clark
Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement.
Ranked #6 on
Arithmetic Reasoning
on GSM8K
1 code implementation • 20 Dec 2022 • Yuling Gu, Bhavana Dalvi Mishra, Peter Clark
Using these questions as probes, we observe that state-of-the-art pre-trained language models (LMs) like GPT-3 and Macaw have fragments of knowledge about these everyday things, but do not have fully coherent "parts mental models" (54-59% accurate, 19-43% conditional constraint violation).
1 code implementation • 31 Oct 2022 • Swaroop Mishra, Matthew Finlayson, Pan Lu, Leonard Tang, Sean Welleck, Chitta Baral, Tanmay Rajpurohit, Oyvind Tafjord, Ashish Sabharwal, Peter Clark, Ashwin Kalyan
Mathematical reasoning skills are essential for general-purpose intelligent systems to perform tasks from grocery shopping to climate modeling.
Ranked #1 on
Mathematical Reasoning
on Lila (OOD)
1 code implementation • 28 Oct 2022 • Yuling Gu, Yao Fu, Valentina Pyatkin, Ian Magnusson, Bhavana Dalvi Mishra, Peter Clark
We hypothesize that to perform this task well, the reader needs to mentally elaborate the scene being described to identify a sensible meaning of the language.
no code implementations • 21 Oct 2022 • Oyvind Tafjord, Bhavana Dalvi Mishra, Peter Clark
Our goal is a question-answering (QA) system that can show how its answers are implied by its own internal beliefs via a systematic chain of reasoning.
1 code implementation • 5 Oct 2022 • Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, Ashish Sabharwal
On symbolic reasoning tasks, we can further decompose sub-tasks that are hard for LLMs into even simpler solvable sub-tasks.
no code implementations • 3 Oct 2022 • Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, Tushar Khot
In this work, we propose complexity-based prompting, a simple and effective example selection scheme for multi-step reasoning.
2 code implementations • 29 Sep 2022 • Pan Lu, Liang Qiu, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Tanmay Rajpurohit, Peter Clark, Ashwin Kalyan
However, it is unknown if the models can handle more complex problems that involve math reasoning over heterogeneous information, such as tabular data.
1 code implementation • 20 Sep 2022 • Pan Lu, Swaroop Mishra, Tony Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, Ashwin Kalyan
We further design language models to learn to generate lectures and explanations as the chain of thought (CoT) to mimic the multi-hop reasoning process when answering ScienceQA questions.
Ranked #3 on
Science Question Answering
on ScienceQA
no code implementations • 27 Apr 2022 • Bhavana Dalvi Mishra, Oyvind Tafjord, Peter Clark
Our goal is a teachable reasoning system for question-answering (QA), where a user can interact with faithful answer explanations, and correct its errors so that the system improves over time.
1 code implementation • 19 Apr 2022 • Matthew Finlayson, Kyle Richardson, Ashish Sabharwal, Peter Clark
We propose Hard RegSet as a challenging instruction learning task, and a controlled environment for studying instruction learning.
no code implementations • ACL 2022 • Swaroop Mishra, Arindam Mitra, Neeraj Varshney, Bhavdeep Sachdeva, Peter Clark, Chitta Baral, Ashwin Kalyan
Given the ubiquitous nature of numbers in text, reasoning with numbers to perform simple calculations is an important skill of AI systems.
1 code implementation • 16 Jan 2022 • Aman Madaan, Niket Tandon, Peter Clark, Yiming Yang
Large LMs such as GPT-3 are powerful, but can commit mistakes that are obvious to humans.
1 code implementation • Findings (NAACL) 2022 • Niket Tandon, Aman Madaan, Peter Clark, Yiming Yang
Our goal is for an LM to continue to improve after deployment, without retraining, using feedback from the user.
1 code implementation • NAACL 2022 • Yuling Gu, Bhavana Dalvi Mishra, Peter Clark
To test this conjecture, we train a new model, DREAM, to answer questions that elaborate the scenes that situated questions are about, and then provide those elaborations as additional context to a question-answering (QA) model.
1 code implementation • 15 Dec 2021 • Niket Tandon, Aman Madaan, Peter Clark, Keisuke Sakaguchi, Yiming Yang
We present a new dataset, Interscript, containing user feedback on a deployed model that generates complex everyday tasks.
1 code implementation • EMNLP 2021 • Ashwin Kalyan, Abhinav Kumar, Arjun Chandrasekaran, Ashish Sabharwal, Peter Clark
FPs are commonly used in quizzes and interviews to bring out and evaluate the creative reasoning abilities of humans.
1 code implementation • 24 Oct 2021 • Aman Madaan, Niket Tandon, Dheeraj Rajagopal, Peter Clark, Yiming Yang, Eduard Hovy
Defeasible reasoning is the mode of reasoning where conclusions can be overturned by taking into account new evidence.
no code implementations • EMNLP 2021 • Nora Kassner, Oyvind Tafjord, Hinrich Schütze, Peter Clark
We show that, in a controlled experimental setting, these two mechanisms result in more consistent beliefs in the overall system, improving both the accuracy and consistency of its answers over time.
2 code implementations • 6 Sep 2021 • Oyvind Tafjord, Peter Clark
Despite the successes of pretrained language models, there are still few high-quality, general-purpose QA systems that are freely available.
no code implementations • 18 Apr 2021 • Aman Madaan, Niket Tandon, Dheeraj Rajagopal, Yiming Yang, Peter Clark, Keisuke Sakaguchi, Ed Hovy
A class of explainable NLP models for reasoning tasks support their decisions by generating free-form or structured explanations, but what happens when these supporting structures contain errors?
1 code implementation • EMNLP 2021 • Bhavana Dalvi, Peter Jansen, Oyvind Tafjord, Zhengnan Xie, Hannah Smith, Leighanna Pipatanangkura, Peter Clark
Our approach is to generate explanations in the form of entailment trees, namely a tree of multipremise entailment steps from facts that are known, through intermediate conclusions, to the hypothesis of interest (namely the question + answer).
no code implementations • 16 Apr 2021 • Nora Kassner, Oyvind Tafjord, Hinrich Schutze, Peter Clark
(This is an old and now obsolete draft.
no code implementations • 16 Apr 2021 • Keisuke Sakaguchi, Chandra Bhagavatula, Ronan Le Bras, Niket Tandon, Peter Clark, Yejin Choi
Scripts - standardized event sequences describing typical everyday activities - have been shown to help understand narratives by providing expectations, resolving ambiguity, and filling in unstated information.
1 code implementation • CSRR (ACL) 2022 • Dheeraj Rajagopal, Aman Madaan, Niket Tandon, Yiming Yang, Shrimai Prabhumoye, Abhilasha Ravichander, Peter Clark, Eduard Hovy
Recently, models have been shown to predict the effects of unexpected situations, e. g., would cloudy skies help or hinder plant growth?
no code implementations • 5 Feb 2021 • Sumithra Bhakthavatsalam, Daniel Khashabi, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, Peter Clark
We present the ARC-DA dataset, a direct-answer ("open response", "freeform") version of the ARC (AI2 Reasoning Challenge) multiple-choice dataset.
no code implementations • Findings (ACL) 2021 • Oyvind Tafjord, Bhavana Dalvi Mishra, Peter Clark
In this work we show that a generative model, called ProofWriter, can reliably generate both implications of a theory and the natural language proof(s) that support them.
no code implementations • EMNLP 2020 • Niket Tandon, Keisuke Sakaguchi, Bhavana Dalvi Mishra, Dheeraj Rajagopal, Peter Clark, Michal Guerquin, Kyle Richardson, Eduard Hovy
Our solution is a new task formulation where given just a procedural text as input, the task is to generate a set of state change tuples(entity, at-tribute, before-state, after-state)for each step, where the entity, attribute, and state values must be predicted from an open vocabulary.
1 code implementation • EMNLP 2020 • Harsh Jhamtani, Peter Clark
The third dataset eOBQA is constructed by adding explanation annotations to the OBQA dataset to test generalization of models trained on eQASC.
Ranked #1 on
Reasoning Chain Explanations
on eQASC
1 code implementation • NAACL 2021 • Tushar Khot, Daniel Khashabi, Kyle Richardson, Peter Clark, Ashish Sabharwal
We propose a general framework called Text Modular Networks(TMNs) for building interpretable systems that learn to solve complex tasks by decomposing them into simpler ones solvable by existing models.
no code implementations • 12 Jun 2020 • Sumithra Bhakthavatsalam, Kyle Richardson, Niket Tandon, Peter Clark
We present a new knowledge-base of hasPart relationships, extracted from a large corpus of generic statements.
1 code implementation • NeurIPS 2020 • Alon Talmor, Oyvind Tafjord, Peter Clark, Yoav Goldberg, Jonathan Berant
In this work, we provide a first demonstration that LMs can be trained to reliably perform systematic reasoning combining both implicit, pre-trained knowledge and explicit natural language statements.
no code implementations • 8 May 2020 • Peter Clark, John Thompson, Bruce Porter
From a modeling perspective, knowledge patterns provide an important insight into the structure of a formal ontology: rather than viewing a formal ontology simply as a list of terms and axioms, knowledge patterns views it as a collection of abstract, modular theories (the "knowledge patterns") plus a collection of modeling decisions stating how different aspects of the world can be modeled using those theories.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Dheeraj Rajagopal, Niket Tandon, Bhavana Dalvi, Peter Clark, Eduard Hovy
We address the task of explaining the effects of perturbations in procedural text, an important test of process comprehension.
no code implementations • 2 May 2020 • Sumithra Bhakthavatsalam, Chloe Anastasiades, Peter Clark
We present a new resource for the NLP community, namely a large (3. 5M+ sentence) knowledge base of *generic statements*, e. g., "Trees remove carbon dioxide from the atmosphere", collected from multiple corpora.
2 code implementations • Findings of the Association for Computational Linguistics 2020 • Daniel Khashabi, Sewon Min, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Clark, Hannaneh Hajishirzi
As evidence, we use the latest advances in language modeling to build a single pre-trained QA model, UnifiedQA, that performs surprisingly well across 17 QA datasets spanning 4 diverse formats.
Ranked #1 on
Question Answering
on CommonsenseQA
2 code implementations • 14 Feb 2020 • Peter Clark, Oyvind Tafjord, Kyle Richardson
However, expressing the knowledge in a formal (logical or probabilistic) representation has been a major obstacle to this research.
no code implementations • IJCNLP 2019 • T, Niket on, Bhavana Dalvi, Keisuke Sakaguchi, Peter Clark, Antoine Bosselut
We introduce WIQA, the first large-scale dataset of {``}What if...{''} questions over procedural text.
no code implementations • WS 2019 • Simon Ostermann, Sheng Zhang, Michael Roth, Peter Clark
This paper reports on the results of the shared tasks of the COIN workshop at EMNLP-IJCNLP 2019.
1 code implementation • 25 Oct 2019 • Tushar Khot, Peter Clark, Michal Guerquin, Peter Jansen, Ashish Sabharwal
Guided by these annotations, we present a two-step approach to mitigate the retrieval challenges.
1 code implementation • IJCNLP 2019 • Tushar Khot, Ashish Sabharwal, Peter Clark
We propose jointly training a model to simultaneously fill this knowledge gap and compose it with the provided partial knowledge.
no code implementations • IJCNLP 2019 • Bhavana Dalvi Mishra, Niket Tandon, Antoine Bosselut, Wen-tau Yih, Peter Clark
Our goal is to better comprehend procedural text, e. g., a paragraph about photosynthesis, by not only predicting what happens, but why some actions need to happen before others.
1 code implementation • 10 Sep 2019 • Niket Tandon, Bhavana Dalvi Mishra, Keisuke Sakaguchi, Antoine Bosselut, Peter Clark
We introduce WIQA, the first large-scale dataset of "What if..." questions over procedural text.
no code implementations • IJCNLP 2019 • Oyvind Tafjord, Matt Gardner, Kevin Lin, Peter Clark
QuaRTz contains general qualitative statements, e. g., "A sunscreen with a higher SPF protects the skin longer.
no code implementations • 4 Sep 2019 • Peter Clark, Oren Etzioni, Daniel Khashabi, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, Niket Tandon, Sumithra Bhakthavatsalam, Dirk Groeneveld, Michal Guerquin, Michael Schmitz
This paper reports unprecedented success on the Grade 8 New York Regents Science Exam, where for the first time a system scores more than 90% on the exam's non-diagram, multiple choice (NDMC) questions.
no code implementations • WS 2019 • Kevin Lin, Oyvind Tafjord, Peter Clark, Matt Gardner
A system is presented a background passage containing at least one of these relations, a novel situation that uses this background, and questions that require reasoning about effects of the relationships in the background passage in the context of the situation.
1 code implementation • LREC 2020 • Dongfang Xu, Peter Jansen, Jaycie Martin, Zhengnan Xie, Vikas Yadav, Harish Tayyar Madabushi, Oyvind Tafjord, Peter Clark
Prior work has demonstrated that question classification (QC), recognizing the problem domain of a question, can help answer it more accurately.
1 code implementation • NAACL 2019 • Xinya Du, Bhavana Dalvi Mishra, Niket Tandon, Antoine Bosselut, Wen-tau Yih, Peter Clark, Claire Cardie
Our goal is procedural text comprehension, namely tracking how the properties of entities (e. g., their location) change with time given a procedural text (e. g., a paragraph about photosynthesis, a recipe).
1 code implementation • 1 May 2019 • Arindam Mitra, Peter Clark, Oyvind Tafjord, Chitta Baral
While in recent years machine learning (ML) based approaches have been the popular approach in developing end-to-end question answering systems, such systems often struggle when additional knowledge is needed to correctly answer the questions.
no code implementations • 20 Nov 2018 • Oyvind Tafjord, Peter Clark, Matt Gardner, Wen-tau Yih, Ashish Sabharwal
Many natural language questions require recognizing and reasoning with qualitative relationships (e. g., in science, economics, and medicine), but are challenging to answer with corpus-based methods.
1 code implementation • ACL 2019 • Souvik Kundu, Tushar Khot, Ashish Sabharwal, Peter Clark
To capture additional context, PathNet also composes the passage representations along each path to compute a passage-based representation.
1 code implementation • EMNLP 2018 • Todor Mihaylov, Peter Clark, Tushar Khot, Ashish Sabharwal
Our oracle experiments designed to circumvent the knowledge retrieval bottleneck demonstrate the value of both the open book and additional facts.
1 code implementation • EMNLP 2018 • Niket Tandon, Bhavana Dalvi Mishra, Joel Grus, Wen-tau Yih, Antoine Bosselut, Peter Clark
Comprehending procedural text, e. g., a paragraph describing photosynthesis, requires modeling actions and the state changes they produce, so that questions about entities at different timepoints can be answered.
no code implementations • EMNLP 2018 • Dongyeop Kang, Tushar Khot, Ashish Sabharwal, Peter Clark
We focus on filling these knowledge gaps in the Science Entailment task, by leveraging an external structured knowledge base (KB) of science facts.
no code implementations • 10 Jun 2018 • Peter Clark
The analysis ignores shallow statistical matching techniques between T and H, and rather asks: What would it take to reasonably infer that T implies H?
no code implementations • NAACL 2018 • Bhavana Dalvi Mishra, Lifu Huang, Niket Tandon, Wen-tau Yih, Peter Clark
The new dataset, ProPara, is the first to contain natural (rather than machine-generated) text about a changing world along with a full annotation of entity states (location and existence) during those changes (81k datapoints).
Ranked #4 on
Procedural Text Understanding
on ProPara
no code implementations • 15 Apr 2018 • Peter Clark, Bhavana Dalvi, Niket Tandon
To supply this knowledge, we leverage VerbNet to build a rulebase (called the Semantic Lexicon) of the preconditions and effects of actions, and use it along with commonsense knowledge of persistence to answer questions about change.
no code implementations • 14 Mar 2018 • Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord
We present a new question set, text corpus, and baselines assembled to encourage AI research in advanced question answering.
no code implementations • 13 Feb 2018 • Peter Clark
This working note discusses the topic of story generation, with a view to identifying the knowledge required to understand aviation incident narratives (which have structural similarities to stories), following the premise that to understand aviation incidents, one should at least be able to generate examples of them.
no code implementations • CONLL 2017 • Rebecca Sharp, Mihai Surdeanu, Peter Jansen, Marco A. Valenzuela-Esc{\'a}rcega, Peter Clark, Michael Hammond
We propose a neural network architecture for QA that reranks answer justifications as an intermediate (and human-interpretable) step in answer selection.
Ranked #1 on
Question Answering
on AI2 Kaggle Dataset
no code implementations • CL 2017 • Peter Jansen, Rebecca Sharp, Mihai Surdeanu, Peter Clark
Our best configuration answers 44{\%} of the questions correctly, where the top justifications for 57{\%} of these correct answers contain a compelling human-readable justification that explains the inference required to arrive at the correct answer.
1 code implementation • ACL 2017 • Tushar Khot, Ashish Sabharwal, Peter Clark
While there has been substantial progress in factoid question-answering (QA), answering complex questions remains challenging, typically requiring both a large body of knowledge and inference techniques.
no code implementations • TACL 2017 • Bhavana Dalvi Mishra, T, Niket on, Peter Clark
Our goal is to construct a domain-targeted, high precision knowledge base (KB), containing general (subject, predicate, object) statements about the world, in support of a downstream question-answering (QA) application.
no code implementations • COLING 2016 • Peter Jansen, Niranjan Balasubramanian, Mihai Surdeanu, Peter Clark
These explanations are used to create a fine-grained categorization of the requirements.
no code implementations • EMNLP 2016 • Rebecca Sharp, Mihai Surdeanu, Peter Jansen, Peter Clark, Michael Hammond
We argue that a better approach is to look for answers that are related to the question in a relevant way, according to the information need of the question, which may be determined through task-specific embeddings.
no code implementations • 20 Apr 2016 • Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Peter Clark, Oren Etzioni, Dan Roth
We propose a structured inference system for this task, formulated as an Integer Linear Program (ILP), that answers natural language questions using a semi-structured knowledge base derived from text, including questions requiring multi-step inference and a combination of multiple facts.
3 code implementations • 14 Apr 2016 • Carissa Schoenick, Peter Clark, Oyvind Tafjord, Peter Turney, Oren Etzioni
Given recent successes in AI (e. g., AlphaGo's victory against Lee Sedol in the game of GO), it's become increasingly important to assess: how close are AI systems to human-level intelligence?
no code implementations • 10 Jul 2015 • Tushar Khot, Niranjan Balasubramanian, Eric Gribkoff, Ashish Sabharwal, Peter Clark, Oren Etzioni
In the first, we simply use the extracted science rules directly as MLN clauses.
no code implementations • TACL 2015 • Daniel Fried, Peter Jansen, Gustave Hahn-Powell, Mihai Surdeanu, Peter Clark
We introduce a higher-order formalism that allows all these lexical semantic models to chain direct evidence to construct indirect associations between question and answer texts, by casting the task as the traversal of graphs that encode direct term associations.