Data Integration

72 papers with code • 0 benchmarks • 7 datasets

Data integration (also called information integration) is the process of consolidating data from a set of heterogeneous data sources into a single uniform data set (materialized integration) or view on the data (virtual integration). Data integration pipelines involve subtasks such as schema matching, table annotation, entity resolution, value normalization, data cleansing, and data fusion. Application domains of data integration include data warehousing, data lakes, and knowledge base consolidation. Surveys on Data integration:

Dong, Srivastava: Big data integration, 2013.
Doan, Halevy, Ives: Principles of Data Integration, 2012.

Benchmarks

Add a Result

These leaderboards are used to track progress in Data Integration

You can find evaluation results in the subtasks. You can also submitting evaluation metrics for this task.

Libraries

Use these libraries to find Data Integration models and implementations

morph-kgc/morph-kgc

4 papers

151

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

MIMIC-III, a freely accessible critical care database

mit-lcp/mimic-iii-paper • Nature 2016

MIMIC-III (‘Medical Information Mart for Intensive Care’) is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital.

Paper
Code

Bayesian Hybrid Matrix Factorisation for Data Integration

ThomasBrouwer/HMF • 17 Apr 2017

We introduce a novel Bayesian hybrid matrix factorisation model (HMF) for data integration, based on combining multiple matrix factorisation methods, that can be used for in- and out-of-matrix prediction of missing values.

Paper
Code

COMO: A Pipeline for Multi-Omics Data Integration in Metabolic Modeling and Drug Discovery

helikarlab/como • 4 Nov 2020

Identifying potential drug targets using metabolic modeling requires integrating multiple modeling methods and heterogenous biological datasets, which can be challenging without sophisticated tools.

Paper
Code

Scalable Randomized Kernel Methods for Multiview Data Integration and Prediction

lasandrall/randmvlearn • • 10 Apr 2023

We develop scalable randomized kernel methods for jointly associating data from multiple sources and simultaneously predicting an outcome or classifying a unit into one of two or more classes.

Paper
Code

Heter-LP: A heterogeneous label propagation algorithm and its application in drug repositioning

dkrlab/Heter-LP-code • 8 Nov 2016

As a result, it is necessary for drug development studies to conduct an investigation into the interrelationships of drugs, protein targets, and diseases.

Paper
Code

Neuro-symbolic representation learning on biological knowledge graphs

bio-ontology-research-group/walking-rdf-and-owl • 13 Dec 2016

Motivation: Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries.

Paper
Code

A Unified Joint Matrix Factorization Framework for Data Integration

dugzzuli/jmf • 25 Jul 2017

In this paper, we introduce a sparse multiple relationship data regularized joint matrix factorization (JMF) framework and two adapted prediction models for pattern recognition and data integration.

Paper
Code

Evaluating approaches for supervised semantic labeling

NICTA/serene-benchmark • • 29 Jan 2018

Relational data sources are still one of the most popular ways to store enterprise or Web data, however, the issue with relational schema is the lack of a well-defined semantic description.

Paper
Code

Joint Estimation and Inference for Data Integration Problems based on Multiple Multi-layered Gaussian Graphical Models

GeorgeMichailidis/JMMLE_code • 9 Mar 2018

Following this, we develop a debiasing technique and asymptotic distributions of inter-layer directed edge weights that utilize already computed neighborhood selection coefficients for nodes in the upper layer.

Paper
Code

Leveraging Legacy Data to Accelerate Materials Design via Preference Learning

tsudalab/PrefInt • 25 Oct 2019

Machine learning applications in materials science are often hampered by shortage of experimental data.

Paper
Code

Data Integration

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result