Data Integration

73 papers with code • 0 benchmarks • 7 datasets

Data integration (also called information integration) is the process of consolidating data from a set of heterogeneous data sources into a single uniform data set (materialized integration) or view on the data (virtual integration). Data integration pipelines involve subtasks such as schema matching, table annotation, entity resolution, value normalization, data cleansing, and data fusion. Application domains of data integration include data warehousing, data lakes, and knowledge base consolidation. Surveys on Data integration:

Libraries

Use these libraries to find Data Integration models and implementations

Most implemented papers

Elastic Coupled Co-clustering for Single-Cell Genomic Data

cuhklinlab/elasticC3 29 Mar 2020

The recent advances in single-cell technologies have enabled us to profile genomic features at unprecedented resolution and datasets from multiple domains are available, including datasets that profile different types of genomic features and datasets that profile the same type of genomic features across different species.

A Systematic Approach to Featurization for Cancer Drug Sensitivity Predictions with Deep Learning

DOE-NCI-Pilot1/CCLFeatureComparison 30 Apr 2020

By combining various cancer cell line (CCL) drug screening panels, the size of the data has grown significantly to begin understanding how advances in deep learning can advance drug response predictions.

The scalable Birth-Death MCMC Algorithm for Mixed Graphical Model Learning with Application to Genomic Data Integration

wangnanwei/Birth-death-MCMC-Model-Selection 8 May 2020

Recent advances in biological research have seen the emergence of high-throughput technologies with numerous applications that allow the study of biological mechanisms at an unprecedented depth and scale.

Consistent and Flexible Selectivity Estimation for High-Dimensional Data

yaoshuwang/SelNet-Estimation 20 May 2020

Selectivity estimation aims at estimating the number of database objects that satisfy a selection criterion.

An Empirical Meta-analysis of the Life Sciences (Linked?) Open Data on the Web

maulikkamdar/LSLODQuery 7 Jun 2020

While the biomedical community has published several "open data" sources in the last decade, most researchers still endure severe logistical and technical challenges to discover, query, and integrate heterogeneous data and knowledge from multiple sources.

Kernel learning approaches for summarising and combining posterior similarity matrices

acabassi/combine-psms 27 Sep 2020

Here we build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models.

SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization

yueyu1030/SumGNN 4 Oct 2020

Furthermore, most previous works focus on binary DDI prediction whereas the multi-typed DDI pharmacological effect prediction is a more meaningful but harder task.

BayReL: Bayesian Relational Learning for Multi-omics Data Integration

ehsanhajiramezanali/BayReL NeurIPS 2020

High-throughput molecular profiling technologies have produced high-dimensional multi-omics data, enabling systematic understanding of living systems at the genome scale.

Profiling Entity Matching Benchmark Tasks

wbsg-uni-mannheim/EntityMatchingTaskProfiler International Conference on Information & Knowledge Management 2020

In order to enable the exact reproducibility of evaluation results, matching tasks need to contain exactly defined sets of matching and non-matching record pairs, as well as a fixed development and test split.

GripNet: Graph Information Propagation on Supergraph for Heterogeneous Graphs

NYXFLOWER/GripNet 29 Oct 2020

Heterogeneous graph representation learning aims to learn low-dimensional vector representations of different types of entities and relations to empower downstream tasks.