In addition to conditional answers, the dataset also features: (1) long context documents with information that is related in logically complex ways; (2) multi-hop questions that require compositional logical reasoning; (3) a combination of extractive questions, yes/no questions, questions with multiple answers, and not-answerable questions; (4) questions asked without knowing the answers.
However, more than 20% of relational tables on the web have 20 or more rows (Cafarella et al., 2008), and these large tables present a challenge for current Transformer models, which are typically limited to 512 tokens.
We introduce a diagnostic dataset aimed at probing LMs for factual knowledge that changes over time and highlight problems with LMs at either end of the spectrum -- those trained on specific slices of temporal data, as well as those trained on a wide range of temporal data.
We propose a new model, DocHopper, that iteratively attends to different parts of long, hierarchically structured documents to answer complex questions.
Ranked #2 on Question Answering on ConditionalQA
Here we study using such LMs to fill in entities in human-authored comparative questions, like ``Which country is older, India or ______?''
We present the Open Predicate Query Language (OPQL); a method for constructing a virtual KB (VKB) trained entirely from text.
While many methods purport to explain predictions by highlighting salient features, what precise aims these explanations serve and how to evaluate their utility are often unstated.
As a step towards making commonsense reasoning research more realistic, we propose to study open-ended commonsense reasoning (OpenCSR) -- the task of answering a commonsense question without any pre-defined choices -- using as a resource only a corpus of commonsense facts written in natural language.
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question.
Massive language models are the core of modern NLP modeling and have been shown to encode impressive amounts of commonsense and factual information.
We address this problem with a novel QE method that is more faithful to deductive reasoning, and show that this leads to better performance on complex queries to incomplete KBs.
In particular, we describe a neural module, DrKIT, that traverses textual data like a KB, softly following paths of relations between mentions of entities in the corpus.
The ability to inferring latent psychological traits from human behavior is key to developing personalized human-interacting machine learning systems.
We introduce PubMedQA, a novel biomedical question answering (QA) dataset collected from PubMed abstracts.
Automatically constructed datasets for generating text from semi-structured data (tables), such as WikiBio, often contain reference texts that diverge from the information in the corresponding semi-structured data.
We present efficient differentiable implementations of second-order multi-hop reasoning using a large symbolic knowledge base (KB).
Large knowledge bases (KBs) are useful for many AI tasks, but are difficult to integrate into modern gradient-based learning systems.
Moreover, Transformer-XL is up to 1, 800+ times faster than vanilla Transformer during evaluation.
We focus on a setting in which a corpus is supplemented with a large but incomplete KB, and on questions that require non-trivial (e. g., ``multi-hop'') reasoning.
For this we use the pre-trained LMs as fixed feature extractors and restrict the downstream task models to not have additional sequence modeling layers.
Current state-of-the-art question answering models reason over an entire passage, not incrementally.
We also show that the learned graphs are generic enough to be transferred to different embeddings on which the graphs have not been trained (including GloVe embeddings, ELMo embeddings, and task-specific RNN hidden units), or embedding-free units such as image pixels.
Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers.
Ranked #59 on Question Answering on HotpotQA
In this paper we look at a more practical setting, namely QA over the combination of a KB and entity-linked text, which is appropriate when an incomplete KB is available with a large text corpus.
We also show that the learned graphs are generic enough to be transferred to different embeddings on which the graphs have not been trained (including GloVe embeddings, ELMo embeddings, and task-specific RNN hidden unit), or embedding-free units such as image pixels.
Many problems in NLP require aggregating information from multiple mentions of the same entity which may be far apart in the text.
Ranked #6 on Question Answering on WikiHop
Existing end-to-end deep QA models (Miller et al., 2016; Weston et al., 2014) need to read the entire text after observing the question, and therefore their complexity in responding a question is linear in the text size.
Though deep neural networks have great success in natural language processing, they are limited at more knowledge intensive AI tasks, such as open-domain Question Answering (QA).
We formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck.
Ranked #8 on Language Modelling on WikiText-2
We present an implementation of a probabilistic first-order logic called TensorLog, in which classes of logical queries are compiled into differentiable functions in a neural-network infrastructure such as Tensorflow or Theano.
Semi-supervised learning methods based on generative adversarial networks (GANs) obtained strong empirical results, but it is not clear 1) how the discriminator benefits from joint training with a generator, and 2) why good semi-supervised classification performance and a good generator cannot be obtained at the same time.
Recent papers have shown that neural networks obtain state-of-the-art performance on several different sequence tagging tasks.
Ranked #9 on Part-Of-Speech Tagging on Penn Treebank
We introduce a model that encodes such graphs as explicit memory in recurrent neural networks, and use it to model coreference relations in text.
Ranked #1 on Question Answering on CNN / Daily Mail
We propose a general approach to modeling semi-supervised learning (SSL) algorithms.
The focus of past machine learning research for Reading Comprehension tasks has been primarily on the design of novel deep learning architectures.
We propose a framework, Neural Logic Programming, that combines the parameter and structure learning of first-order logical rules in an end-to-end differentiable model.
In this framework, we train a generative model to generate questions based on the unlabeled text, and combine model-generated questions with human-generated questions for training question answering models.
Previous work combines word-level and character-level representations using concatenation or scalar weighting, which is suboptimal for high-level tasks like reading comprehension.
Ranked #42 on Question Answering on SQuAD1.1 dev
We propose a framework to improve performance of distantly-supervised relation extraction, by jointly learning to solve two related tasks: concept-instance extraction and relation extraction.
In this paper we study the problem of answering cloze-style questions over documents.
Ranked #2 on Question Answering on CNN / Daily Mail
We present a semi-supervised learning framework based on graph embeddings.
Ranked #1 on Node Classification on USA Air-Traffic
Many methods have been proposed for detecting emerging events in text streams using topic modeling.
To this end, we develop a similarity measure for Java classes using distributional information about how they are used in software, which we combine with corpus statistics on the distribution of contexts in which the classes appear in text.
We show that the problem of constructing proofs for this logic is related to computation of personalized PageRank (PPR) on a linearized version of the proof space, and using on this connection, we develop a proveably-correct approximate grounding scheme, based on the PageRank-Nibble algorithm.
We describe a open-domain information extraction method for extracting concept-instance pairs from an HTML corpus.
In multiclass semi-supervised learning (SSL), it is sometimes the case that the number of classes present in the data is not known, and hence no labeled examples are provided for some classes.
We consider scenarios in which this effect arises in a model of rational decision making which includes the possibility of deceptive information.
In many probabilistic first-order representation systems, inference is performed by "grounding"---i. e., mapping it to a propositional representation, and then performing propositional inference.