Open Information Extraction
60 papers with code • 13 benchmarks • 13 datasets
In natural language processing, open information extraction is the task of generating a structured, machine-readable representation of the information in text, usually in the form of triples or n-ary propositions (Source: Wikipedia).
Datasets
Latest papers
Syntactic Multi-view Learning for Open Information Extraction
In this paper, we model both constituency and dependency trees into word-level graphs, and enable neural OpenIE to learn from the syntactic structures.
mOKB6: A Multilingual Open Knowledge Base Completion Benchmark
Automated completion of open knowledge bases (Open KBs), which are constructed from triples of the form (subject phrase, relation phrase, object phrase), obtained via open information extraction (Open IE) system, are useful for discovering novel facts that may not be directly present in the text.
DetIE: Multilingual Open Information Extraction Inspired by Object Detection
Our model sets the new state of the art performance of 67. 7% F1 on CaRB evaluated as OIE2016 while being 3. 35x faster at inference than previous state of the art.
Multi-View Clustering for Open Knowledge Base Canonicalization
In this paper, we propose CMVC, a novel unsupervised framework that leverages these two views of knowledge jointly for canonicalizing OKBs without the need of manually annotated labels.
DeepStruct: Pretraining of Language Models for Structure Prediction
We introduce a method for improving the structural understanding abilities of language models.
CompactIE: Compact Facts in Open Information Extraction
Our experiments on CaRB and Wire57 datasets indicate that CompactIE finds 1. 5x-2x more compact extractions than previous systems, with high precision, establishing a new state-of-the-art performance in OpenIE.
DOM-LM: Learning Generalizable Representations for HTML Documents
We argue that the text and HTML structure together convey important semantics of the content and therefore warrant a special treatment for their representation learning.
Sequence-to-Sequence Models for Extracting Information from Registration and Legal Documents
A typical information extraction pipeline consists of token- or span-level classification models coupled with a series of pre- and post-processing scripts.
Open-CyKG: An Open Cyber Threat Intelligence Knowledge Graph
Instant analysis of cybersecurity reports is a fundamental challenge for security experts as an immeasurable amount of cyber information is generated on a daily basis, which necessitates automated information extraction tools to facilitate querying and retrieval of data.
Refined Commonsense Knowledge from Large-Scale Web Contents
However, they are restricted in their expressiveness to subject-predicate-object (SPO) triples with simple concepts for S and strings for P and O.