no code implementations • Findings (EMNLP) 2021 • Keisuke Sakaguchi, Chandra Bhagavatula, Ronan Le Bras, Niket Tandon, Peter Clark, Yejin Choi
Scripts – prototypical event sequences describing everyday activities – have been shown to help understand narratives by providing expectations, resolving ambiguity, and filling in unstated information.
1 code implementation • EMNLP 2021 • Aman Madaan, Niket Tandon, Dheeraj Rajagopal, Peter Clark, Yiming Yang, Eduard Hovy
Defeasible reasoning is the mode of reasoning where conclusions can be overturned by taking into account new evidence.
1 code implementation • 11 Sep 2024 • Ben Bogin, Kejuan Yang, Shashank Gupta, Kyle Richardson, Erin Bransom, Peter Clark, Ashish Sabharwal, Tushar Khot
To advance towards this goal, we introduce SUPER, the first benchmark designed to evaluate the capability of LLMs in setting up and executing tasks from research repositories.
1 code implementation • 1 Jul 2024 • Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal, Bhavana Dalvi Mishra, Abhijeetsingh Meena, Aryan Prakhar, Tirth Vora, Tushar Khot, Ashish Sabharwal, Peter Clark
Can the rapid advances in code generation, function calling, and data analysis using large language models (LLMs) help automate the search and verification of hypotheses purely from a set of provided datasets?
1 code implementation • 10 Jun 2024 • Peter Jansen, Marc-Alexandre Côté, Tushar Khot, Erin Bransom, Bhavana Dalvi Mishra, Bodhisattwa Prasad Majumder, Oyvind Tafjord, Peter Clark
However, developing and evaluating an AI agent's capacity for end-to-end scientific reasoning is challenging as running real-world experiments is often prohibitively expensive or infeasible.
no code implementations • 10 Jun 2024 • Ruoyao Wang, Graham Todd, Ziang Xiao, Xingdi Yuan, Marc-Alexandre Côté, Peter Clark, Peter Jansen
Can current language models themselves serve as world simulators, correctly predicting how actions change different world states, thus bypassing the need for extensive manual coding?
1 code implementation • 30 May 2024 • Li Zhang, Peter Jansen, Tianyi Zhang, Peter Clark, Chris Callison-Burch, Niket Tandon
A recent, promising line of work uses LLMs to generate a formal representation of the environment that can be solved by a symbolic planner.
1 code implementation • 25 May 2024 • Nathaniel Weir, Muhammad Khalifa, Linlu Qiu, Orion Weller, Peter Clark
CoGEX works by (1) training LMs to generate their own pseudo-programs, (2) teaching them to emulate their generated program's execution, including those leaf functions, allowing the LM's knowledge to fill in the execution gaps; and (3) using them to search over many programs to find an optimal one.
no code implementations • 29 Feb 2024 • Tianyi Zhang, Li Zhang, Zhaoyi Hou, Ziyu Wang, Yuling Gu, Peter Clark, Chris Callison-Burch, Niket Tandon
Planning in a text-based environment continues to be a major challenge for AI systems.
no code implementations • 22 Feb 2024 • Nathaniel Weir, Kate Sanders, Orion Weller, Shreya Sharma, Dongwei Jiang, Zhengping Jiang, Bhavana Dalvi Mishra, Oyvind Tafjord, Peter Jansen, Peter Clark, Benjamin Van Durme
Recent language models enable new opportunities for structured reasoning with text, such as the construction of intuitive, proof-like textual entailment trees without relying on brittle formal logic.
no code implementations • 21 Feb 2024 • Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal, Sanchaita Hazra, Ashish Sabharwal, Peter Clark
With the accumulation of data at an unprecedented rate, its potential to fuel scientific discovery is growing exponentially.
1 code implementation • 5 Feb 2024 • Kolby Nottingham, Bodhisattwa Prasad Majumder, Bhavana Dalvi Mishra, Sameer Singh, Peter Clark, Roy Fox
We evaluate our method in the classic videogame NetHack and the text environment ScienceWorld to demonstrate SSO's ability to optimize a set of skills and perform in-context policy improvement.
1 code implementation • 12 Jan 2024 • Peter Hase, Mohit Bansal, Peter Clark, Sarah Wiegreffe
In this paper, we present the surprising conclusion that current pretrained language models often generalize relatively well from easy to hard data, even performing as well as oracle models finetuned on hard data.
no code implementations • 12 Dec 2023 • Peter Clark, Bhavana Dalvi Mishra, Oyvind Tafjord
This shows the clear progression of models towards improved factual accuracy and entailment reasoning, and the dataset provides a new benchmark that more cleanly separates and quantifies these two notions.
no code implementations • 16 Nov 2023 • Yash Kumar Lal, Li Zhang, Faeze Brahman, Bodhisattwa Prasad Majumder, Peter Clark, Niket Tandon
Our approach is to test several simple multi-LLM-agent architectures for customization, as well as an end-to-end LLM, using a new evaluation set, called CustomPlans, of over 200 WikiHow procedures each with a customization need.
1 code implementation • 16 Nov 2023 • Ben Bogin, Shivanshu Gupta, Peter Clark, Ashish Sabharwal
In-context learning (ICL) is an appealing approach for semantic parsing due to its few-shot nature and improved generalization.
no code implementations • 16 Nov 2023 • Yuling Gu, Oyvind Tafjord, Peter Clark
While LLMs can provide reasoned explanations along with their answers, the nature and quality of those explanations are still poorly understood.
1 code implementation • 8 Nov 2023 • Archiki Prasad, Alexander Koller, Mareike Hartmann, Peter Clark, Ashish Sabharwal, Mohit Bansal, Tushar Khot
Large Language Models (LLMs) are increasingly being used for interactive decision-making tasks requiring planning and adapting to the environment.
1 code implementation • 8 Nov 2023 • Shashank Gupta, Vaishnavi Shrivastava, Ameet Deshpande, Ashwin Kalyan, Peter Clark, Ashish Sabharwal, Tushar Khot
Our experiments with ChatGPT-3. 5 show that this bias is ubiquitous - 80% of our personas demonstrate bias; it is significant - some datasets show performance drops of 70%+; and can be especially harmful for certain groups - some personas suffer statistically significant drops on 80%+ of the datasets.
no code implementations • 6 Nov 2023 • Vishvak Murahari, Ameet Deshpande, Peter Clark, Tanmay Rajpurohit, Ashish Sabharwal, Karthik Narasimhan, Ashwin Kalyan
In this work, we address the shortcomings of quantitative metrics by proposing QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
no code implementations • 16 Oct 2023 • Bodhisattwa Prasad Majumder, Bhavana Dalvi Mishra, Peter Jansen, Oyvind Tafjord, Niket Tandon, Li Zhang, Chris Callison-Burch, Peter Clark
Language agents have shown some ability to interact with an external environment, e. g., a virtual world such as ScienceWorld, to perform complex tasks, e. g., growing a plant, without the startup costs of reinforcement learning.
1 code implementation • 24 May 2023 • Sarah Wiegreffe, Matthew Finlayson, Oyvind Tafjord, Peter Clark, Ashish Sabharwal
For example, both normalization and prompting methods for reducing SFC can be ineffective or even detrimental to task performance for some LMs.
no code implementations • 23 May 2023 • Nora Kassner, Oyvind Tafjord, Ashish Sabharwal, Kyle Richardson, Hinrich Schuetze, Peter Clark
To address this, our goals are to make model beliefs and their inferential relationships explicit, and to resolve inconsistencies that may exist, so that answers are supported by interpretable chains of reasoning drawn from a consistent network of beliefs.
no code implementations • 23 May 2023 • Wenhao Yu, Meng Jiang, Peter Clark, Ashish Sabharwal
Although counterfactual reasoning is a fundamental aspect of intelligence, the lack of large-scale counterfactual open-domain question-answering (QA) benchmarks makes it difficult to evaluate and improve models on this ability.
no code implementations • 22 May 2023 • Zhenwen Liang, Wenhao Yu, Tanmay Rajpurohit, Peter Clark, Xiangliang Zhang, Ashwin Kaylan
In this paper, we present a novel approach for distilling math word problem solving capabilities from large language models (LLMs) into smaller, more efficient student models.
1 code implementation • 15 May 2023 • Afra Feyza Akyürek, Ekin Akyürek, Aman Madaan, Ashwin Kalyan, Peter Clark, Derry Wijaya, Niket Tandon
Despite their unprecedented success, even the largest language models make mistakes.
3 code implementations • NeurIPS 2023 • Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, Peter Clark
Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement.
1 code implementation • 20 Dec 2022 • Yuling Gu, Bhavana Dalvi Mishra, Peter Clark
Using these questions as probes, we observe that state-of-the-art pre-trained language models (LMs) like GPT-3 and Macaw have fragments of knowledge about these everyday things, but do not have fully coherent "parts mental models" (54-59% accurate, 19-43% conditional constraint violation).
1 code implementation • 31 Oct 2022 • Swaroop Mishra, Matthew Finlayson, Pan Lu, Leonard Tang, Sean Welleck, Chitta Baral, Tanmay Rajpurohit, Oyvind Tafjord, Ashish Sabharwal, Peter Clark, Ashwin Kalyan
Mathematical reasoning skills are essential for general-purpose intelligent systems to perform tasks from grocery shopping to climate modeling.
Ranked #1 on Mathematical Reasoning on Lila (OOD)
1 code implementation • 28 Oct 2022 • Yuling Gu, Yao Fu, Valentina Pyatkin, Ian Magnusson, Bhavana Dalvi Mishra, Peter Clark
We hypothesize that to perform this task well, the reader needs to mentally elaborate the scene being described to identify a sensible meaning of the language.
no code implementations • 21 Oct 2022 • Oyvind Tafjord, Bhavana Dalvi Mishra, Peter Clark
Our goal is a question-answering (QA) system that can show how its answers are implied by its own internal beliefs via a systematic chain of reasoning.
1 code implementation • 5 Oct 2022 • Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, Ashish Sabharwal
On symbolic reasoning tasks, we can further decompose sub-tasks that are hard for LLMs into even simpler solvable sub-tasks.
no code implementations • 3 Oct 2022 • Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, Tushar Khot
In this work, we propose complexity-based prompting, a simple and effective example selection scheme for multi-step reasoning.
2 code implementations • 29 Sep 2022 • Pan Lu, Liang Qiu, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Tanmay Rajpurohit, Peter Clark, Ashwin Kalyan
However, it is unknown if the models can handle more complex problems that involve math reasoning over heterogeneous information, such as tabular data.
1 code implementation • 20 Sep 2022 • Pan Lu, Swaroop Mishra, Tony Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, Ashwin Kalyan
We further design language models to learn to generate lectures and explanations as the chain of thought (CoT) to mimic the multi-hop reasoning process when answering ScienceQA questions.
Ranked #6 on Science Question Answering on ScienceQA
no code implementations • 16 Sep 2022 • Nathaniel Weir, Peter Clark, Benjamin Van Durme
Our goal is a modern approach to answering questions via systematic reasoning where answers are supported by human interpretable proof trees grounded in an NL corpus of authoritative facts.
no code implementations • 27 Apr 2022 • Bhavana Dalvi Mishra, Oyvind Tafjord, Peter Clark
Our goal is a teachable reasoning system for question-answering (QA), where a user can interact with faithful answer explanations, and correct its errors so that the system improves over time.
1 code implementation • 19 Apr 2022 • Matthew Finlayson, Kyle Richardson, Ashish Sabharwal, Peter Clark
We propose Hard RegSet as a challenging instruction learning task, and a controlled environment for studying instruction learning.
no code implementations • ACL 2022 • Swaroop Mishra, Arindam Mitra, Neeraj Varshney, Bhavdeep Sachdeva, Peter Clark, Chitta Baral, Ashwin Kalyan
Given the ubiquitous nature of numbers in text, reasoning with numbers to perform simple calculations is an important skill of AI systems.
1 code implementation • 16 Jan 2022 • Aman Madaan, Niket Tandon, Peter Clark, Yiming Yang
Large LMs such as GPT-3 are powerful, but can commit mistakes that are obvious to humans.
1 code implementation • Findings (NAACL) 2022 • Niket Tandon, Aman Madaan, Peter Clark, Yiming Yang
Our goal is for an LM to continue to improve after deployment, without retraining, using feedback from the user.
1 code implementation • NAACL 2022 • Yuling Gu, Bhavana Dalvi Mishra, Peter Clark
To test this conjecture, we train a new model, DREAM, to answer questions that elaborate the scenes that situated questions are about, and then provide those elaborations as additional context to a question-answering (QA) model.
1 code implementation • 15 Dec 2021 • Niket Tandon, Aman Madaan, Peter Clark, Keisuke Sakaguchi, Yiming Yang
We present a new dataset, Interscript, containing user feedback on a deployed model that generates complex everyday tasks.
1 code implementation • EMNLP 2021 • Ashwin Kalyan, Abhinav Kumar, Arjun Chandrasekaran, Ashish Sabharwal, Peter Clark
FPs are commonly used in quizzes and interviews to bring out and evaluate the creative reasoning abilities of humans.
1 code implementation • 24 Oct 2021 • Aman Madaan, Niket Tandon, Dheeraj Rajagopal, Peter Clark, Yiming Yang, Eduard Hovy
Defeasible reasoning is the mode of reasoning where conclusions can be overturned by taking into account new evidence.
no code implementations • EMNLP 2021 • Nora Kassner, Oyvind Tafjord, Hinrich Schütze, Peter Clark
We show that, in a controlled experimental setting, these two mechanisms result in more consistent beliefs in the overall system, improving both the accuracy and consistency of its answers over time.
2 code implementations • 6 Sep 2021 • Oyvind Tafjord, Peter Clark
Despite the successes of pretrained language models, there are still few high-quality, general-purpose QA systems that are freely available.
no code implementations • 18 Apr 2021 • Aman Madaan, Niket Tandon, Dheeraj Rajagopal, Yiming Yang, Peter Clark, Keisuke Sakaguchi, Ed Hovy
A class of explainable NLP models for reasoning tasks support their decisions by generating free-form or structured explanations, but what happens when these supporting structures contain errors?
1 code implementation • EMNLP 2021 • Bhavana Dalvi, Peter Jansen, Oyvind Tafjord, Zhengnan Xie, Hannah Smith, Leighanna Pipatanangkura, Peter Clark
Our approach is to generate explanations in the form of entailment trees, namely a tree of multipremise entailment steps from facts that are known, through intermediate conclusions, to the hypothesis of interest (namely the question + answer).
no code implementations • 16 Apr 2021 • Nora Kassner, Oyvind Tafjord, Hinrich Schutze, Peter Clark
(This is an old and now obsolete draft.
no code implementations • 16 Apr 2021 • Keisuke Sakaguchi, Chandra Bhagavatula, Ronan Le Bras, Niket Tandon, Peter Clark, Yejin Choi
Scripts - standardized event sequences describing typical everyday activities - have been shown to help understand narratives by providing expectations, resolving ambiguity, and filling in unstated information.
1 code implementation • CSRR (ACL) 2022 • Dheeraj Rajagopal, Aman Madaan, Niket Tandon, Yiming Yang, Shrimai Prabhumoye, Abhilasha Ravichander, Peter Clark, Eduard Hovy
Recently, models have been shown to predict the effects of unexpected situations, e. g., would cloudy skies help or hinder plant growth?
no code implementations • 5 Feb 2021 • Sumithra Bhakthavatsalam, Daniel Khashabi, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, Peter Clark
We present the ARC-DA dataset, a direct-answer ("open response", "freeform") version of the ARC (AI2 Reasoning Challenge) multiple-choice dataset.
no code implementations • Findings (ACL) 2021 • Oyvind Tafjord, Bhavana Dalvi Mishra, Peter Clark
In this work we show that a generative model, called ProofWriter, can reliably generate both implications of a theory and the natural language proof(s) that support them.
no code implementations • EMNLP 2020 • Niket Tandon, Keisuke Sakaguchi, Bhavana Dalvi Mishra, Dheeraj Rajagopal, Peter Clark, Michal Guerquin, Kyle Richardson, Eduard Hovy
Our solution is a new task formulation where given just a procedural text as input, the task is to generate a set of state change tuples(entity, at-tribute, before-state, after-state)for each step, where the entity, attribute, and state values must be predicted from an open vocabulary.
1 code implementation • EMNLP 2020 • Harsh Jhamtani, Peter Clark
The third dataset eOBQA is constructed by adding explanation annotations to the OBQA dataset to test generalization of models trained on eQASC.
Ranked #1 on Reasoning Chain Explanations on eQASC
1 code implementation • NAACL 2021 • Tushar Khot, Daniel Khashabi, Kyle Richardson, Peter Clark, Ashish Sabharwal
We propose a general framework called Text Modular Networks(TMNs) for building interpretable systems that learn to solve complex tasks by decomposing them into simpler ones solvable by existing models.
no code implementations • 12 Jun 2020 • Sumithra Bhakthavatsalam, Kyle Richardson, Niket Tandon, Peter Clark
We present a new knowledge-base of hasPart relationships, extracted from a large corpus of generic statements.
1 code implementation • NeurIPS 2020 • Alon Talmor, Oyvind Tafjord, Peter Clark, Yoav Goldberg, Jonathan Berant
In this work, we provide a first demonstration that LMs can be trained to reliably perform systematic reasoning combining both implicit, pre-trained knowledge and explicit natural language statements.
no code implementations • 8 May 2020 • Peter Clark, John Thompson, Bruce Porter
From a modeling perspective, knowledge patterns provide an important insight into the structure of a formal ontology: rather than viewing a formal ontology simply as a list of terms and axioms, knowledge patterns views it as a collection of abstract, modular theories (the "knowledge patterns") plus a collection of modeling decisions stating how different aspects of the world can be modeled using those theories.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Dheeraj Rajagopal, Niket Tandon, Bhavana Dalvi, Peter Clark, Eduard Hovy
We address the task of explaining the effects of perturbations in procedural text, an important test of process comprehension.
2 code implementations • Findings of the Association for Computational Linguistics 2020 • Daniel Khashabi, Sewon Min, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Clark, Hannaneh Hajishirzi
As evidence, we use the latest advances in language modeling to build a single pre-trained QA model, UnifiedQA, that performs surprisingly well across 17 QA datasets spanning 4 diverse formats.
Ranked #5 on Common Sense Reasoning on WinoGrande
no code implementations • 2 May 2020 • Sumithra Bhakthavatsalam, Chloe Anastasiades, Peter Clark
We present a new resource for the NLP community, namely a large (3. 5M+ sentence) knowledge base of *generic statements*, e. g., "Trees remove carbon dioxide from the atmosphere", collected from multiple corpora.
2 code implementations • 14 Feb 2020 • Peter Clark, Oyvind Tafjord, Kyle Richardson
However, expressing the knowledge in a formal (logical or probabilistic) representation has been a major obstacle to this research.
no code implementations • WS 2019 • Simon Ostermann, Sheng Zhang, Michael Roth, Peter Clark
This paper reports on the results of the shared tasks of the COIN workshop at EMNLP-IJCNLP 2019.
no code implementations • IJCNLP 2019 • T, Niket on, Bhavana Dalvi, Keisuke Sakaguchi, Peter Clark, Antoine Bosselut
We introduce WIQA, the first large-scale dataset of {``}What if...{''} questions over procedural text.
1 code implementation • 25 Oct 2019 • Tushar Khot, Peter Clark, Michal Guerquin, Peter Jansen, Ashish Sabharwal
Guided by these annotations, we present a two-step approach to mitigate the retrieval challenges.
1 code implementation • IJCNLP 2019 • Tushar Khot, Ashish Sabharwal, Peter Clark
We propose jointly training a model to simultaneously fill this knowledge gap and compose it with the provided partial knowledge.
no code implementations • IJCNLP 2019 • Bhavana Dalvi Mishra, Niket Tandon, Antoine Bosselut, Wen-tau Yih, Peter Clark
Our goal is to better comprehend procedural text, e. g., a paragraph about photosynthesis, by not only predicting what happens, but why some actions need to happen before others.
1 code implementation • 10 Sep 2019 • Niket Tandon, Bhavana Dalvi Mishra, Keisuke Sakaguchi, Antoine Bosselut, Peter Clark
We introduce WIQA, the first large-scale dataset of "What if..." questions over procedural text.
no code implementations • IJCNLP 2019 • Oyvind Tafjord, Matt Gardner, Kevin Lin, Peter Clark
QuaRTz contains general qualitative statements, e. g., "A sunscreen with a higher SPF protects the skin longer.
no code implementations • 4 Sep 2019 • Peter Clark, Oren Etzioni, Daniel Khashabi, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, Niket Tandon, Sumithra Bhakthavatsalam, Dirk Groeneveld, Michal Guerquin, Michael Schmitz
This paper reports unprecedented success on the Grade 8 New York Regents Science Exam, where for the first time a system scores more than 90% on the exam's non-diagram, multiple choice (NDMC) questions.
no code implementations • WS 2019 • Kevin Lin, Oyvind Tafjord, Peter Clark, Matt Gardner
A system is presented a background passage containing at least one of these relations, a novel situation that uses this background, and questions that require reasoning about effects of the relationships in the background passage in the context of the situation.
1 code implementation • LREC 2020 • Dongfang Xu, Peter Jansen, Jaycie Martin, Zhengnan Xie, Vikas Yadav, Harish Tayyar Madabushi, Oyvind Tafjord, Peter Clark
Prior work has demonstrated that question classification (QC), recognizing the problem domain of a question, can help answer it more accurately.
1 code implementation • NAACL 2019 • Xinya Du, Bhavana Dalvi Mishra, Niket Tandon, Antoine Bosselut, Wen-tau Yih, Peter Clark, Claire Cardie
Our goal is procedural text comprehension, namely tracking how the properties of entities (e. g., their location) change with time given a procedural text (e. g., a paragraph about photosynthesis, a recipe).
1 code implementation • 1 May 2019 • Arindam Mitra, Peter Clark, Oyvind Tafjord, Chitta Baral
While in recent years machine learning (ML) based approaches have been the popular approach in developing end-to-end question answering systems, such systems often struggle when additional knowledge is needed to correctly answer the questions.
no code implementations • 20 Nov 2018 • Oyvind Tafjord, Peter Clark, Matt Gardner, Wen-tau Yih, Ashish Sabharwal
Many natural language questions require recognizing and reasoning with qualitative relationships (e. g., in science, economics, and medicine), but are challenging to answer with corpus-based methods.
1 code implementation • ACL 2019 • Souvik Kundu, Tushar Khot, Ashish Sabharwal, Peter Clark
To capture additional context, PathNet also composes the passage representations along each path to compute a passage-based representation.
1 code implementation • EMNLP 2018 • Todor Mihaylov, Peter Clark, Tushar Khot, Ashish Sabharwal
Our oracle experiments designed to circumvent the knowledge retrieval bottleneck demonstrate the value of both the open book and additional facts.
Ranked #26 on Question Answering on OpenBookQA
1 code implementation • EMNLP 2018 • Niket Tandon, Bhavana Dalvi Mishra, Joel Grus, Wen-tau Yih, Antoine Bosselut, Peter Clark
Comprehending procedural text, e. g., a paragraph describing photosynthesis, requires modeling actions and the state changes they produce, so that questions about entities at different timepoints can be answered.
no code implementations • EMNLP 2018 • Dongyeop Kang, Tushar Khot, Ashish Sabharwal, Peter Clark
We focus on filling these knowledge gaps in the Science Entailment task, by leveraging an external structured knowledge base (KB) of science facts.
no code implementations • 10 Jun 2018 • Peter Clark
The analysis ignores shallow statistical matching techniques between T and H, and rather asks: What would it take to reasonably infer that T implies H?
no code implementations • NAACL 2018 • Bhavana Dalvi Mishra, Lifu Huang, Niket Tandon, Wen-tau Yih, Peter Clark
The new dataset, ProPara, is the first to contain natural (rather than machine-generated) text about a changing world along with a full annotation of entity states (location and existence) during those changes (81k datapoints).
Ranked #4 on Procedural Text Understanding on ProPara
no code implementations • 15 Apr 2018 • Peter Clark, Bhavana Dalvi, Niket Tandon
To supply this knowledge, we leverage VerbNet to build a rulebase (called the Semantic Lexicon) of the preconditions and effects of actions, and use it along with commonsense knowledge of persistence to answer questions about change.
1 code implementation • 14 Mar 2018 • Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord
We present a new question set, text corpus, and baselines assembled to encourage AI research in advanced question answering.
no code implementations • 13 Feb 2018 • Peter Clark
This working note discusses the topic of story generation, with a view to identifying the knowledge required to understand aviation incident narratives (which have structural similarities to stories), following the premise that to understand aviation incidents, one should at least be able to generate examples of them.
no code implementations • CONLL 2017 • Rebecca Sharp, Mihai Surdeanu, Peter Jansen, Marco A. Valenzuela-Esc{\'a}rcega, Peter Clark, Michael Hammond
We propose a neural network architecture for QA that reranks answer justifications as an intermediate (and human-interpretable) step in answer selection.
Ranked #1 on Question Answering on AI2 Kaggle Dataset
no code implementations • CL 2017 • Peter Jansen, Rebecca Sharp, Mihai Surdeanu, Peter Clark
Our best configuration answers 44{\%} of the questions correctly, where the top justifications for 57{\%} of these correct answers contain a compelling human-readable justification that explains the inference required to arrive at the correct answer.
1 code implementation • ACL 2017 • Tushar Khot, Ashish Sabharwal, Peter Clark
While there has been substantial progress in factoid question-answering (QA), answering complex questions remains challenging, typically requiring both a large body of knowledge and inference techniques.
no code implementations • TACL 2017 • Bhavana Dalvi Mishra, T, Niket on, Peter Clark
Our goal is to construct a domain-targeted, high precision knowledge base (KB), containing general (subject, predicate, object) statements about the world, in support of a downstream question-answering (QA) application.
no code implementations • COLING 2016 • Peter Jansen, Niranjan Balasubramanian, Mihai Surdeanu, Peter Clark
These explanations are used to create a fine-grained categorization of the requirements.
no code implementations • EMNLP 2016 • Rebecca Sharp, Mihai Surdeanu, Peter Jansen, Peter Clark, Michael Hammond
We argue that a better approach is to look for answers that are related to the question in a relevant way, according to the information need of the question, which may be determined through task-specific embeddings.
no code implementations • 20 Apr 2016 • Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Peter Clark, Oren Etzioni, Dan Roth
We propose a structured inference system for this task, formulated as an Integer Linear Program (ILP), that answers natural language questions using a semi-structured knowledge base derived from text, including questions requiring multi-step inference and a combination of multiple facts.
3 code implementations • 14 Apr 2016 • Carissa Schoenick, Peter Clark, Oyvind Tafjord, Peter Turney, Oren Etzioni
Given recent successes in AI (e. g., AlphaGo's victory against Lee Sedol in the game of GO), it's become increasingly important to assess: how close are AI systems to human-level intelligence?
no code implementations • 10 Jul 2015 • Tushar Khot, Niranjan Balasubramanian, Eric Gribkoff, Ashish Sabharwal, Peter Clark, Oren Etzioni
In the first, we simply use the extracted science rules directly as MLN clauses.
no code implementations • TACL 2015 • Daniel Fried, Peter Jansen, Gustave Hahn-Powell, Mihai Surdeanu, Peter Clark
We introduce a higher-order formalism that allows all these lexical semantic models to chain direct evidence to construct indirect associations between question and answer texts, by casting the task as the traversal of graphs that encode direct term associations.