Document-level Closed Information Extraction

1 papers with code • 3 benchmarks • 3 datasets

Document-level closed information extraction (DocIE) is a subtask of information extraction that seeks to extract a set of triplets, or facts, of the form (subject, relation, object) from unstructured texts that are fully linked to a reference knowledge base, i.e., consistent with a predefined set of entities and relations from a knowledge base. DocIE entails tasks such as mention detection, entity typing, named entity recognition, entity disambiguation, entity linking, coreference resolution, and document-level relation extraction. DocIE is more challenging than sentence-level closed information extraction as it involves capturing long-range dependencies effectively to extract relations between entities that are further apart from each other in the text. Another difference is that DocIE necessitates a coreference resolution stage to group all the different mentions in the document referring to the same entity. DocIE is crucial for applications such as knowledge graph construction, question answering, knowledge discovery, or text summarization.

Source: REXEL: An End-to-end Model for Document-Level Relation Extraction and Entity Linking

Most implemented papers

REXEL: An End-to-end Model for Document-Level Relation Extraction and Entity Linking

amazon-science/e2e-docie 19 Apr 2024

Extracting structured information from unstructured text is critical for many downstream NLP applications and is traditionally achieved by closed information extraction (cIE).