Populating a database with unstructured information is a long-standing
problem in industry and research that encompasses problems of extraction,
cleaning, and integration. Recent names used for this problem include dealing
with dark data and knowledge base construction (KBC)...
In this work, we describe
DeepDive, a system that combines database and machine learning ideas to help
develop KBC systems, and we present techniques to make the KBC process more
efficient. We observe that the KBC process is iterative, and we develop
techniques to incrementally produce inference results for KBC systems. We
propose two methods for incremental inference, based respectively on sampling
and variational techniques. We also study the tradeoff space of these methods
and develop a simple rule-based optimizer. DeepDive includes all of these
contributions, and we evaluate DeepDive on five KBC systems, showing that it
can speed up KBC inference tasks by up to two orders of magnitude with
negligible impact on quality.