Data Integration
118 papers with code • 0 benchmarks • 7 datasets
Data integration (also called information integration) is the process of consolidating data from a set of heterogeneous data sources into a single uniform data set (materialized integration) or view on the data (virtual integration). Data integration pipelines involve subtasks such as schema matching, table annotation, entity resolution, value normalization, data cleansing, and data fusion. Application domains of data integration include data warehousing, data lakes, and knowledge base consolidation. Surveys on Data integration:
Benchmarks
These leaderboards are used to track progress in Data Integration
Libraries
Use these libraries to find Data Integration models and implementationsMost implemented papers
IAM: Enhancing RGB-D Instance Segmentation with New Benchmarks
Image segmentation is a vital task for providing human assistance and enhancing autonomy in our daily lives.
MIMIC-III, a freely accessible critical care database
MIMIC-III (‘Medical Information Mart for Intensive Care’) is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital.
Bayesian Hybrid Matrix Factorisation for Data Integration
We introduce a novel Bayesian hybrid matrix factorisation model (HMF) for data integration, based on combining multiple matrix factorisation methods, that can be used for in- and out-of-matrix prediction of missing values.
COMO: A Pipeline for Multi-Omics Data Integration in Metabolic Modeling and Drug Discovery
Identifying potential drug targets using metabolic modeling requires integrating multiple modeling methods and heterogenous biological datasets, which can be challenging without sophisticated tools.
Integrating Multimodal Data for Joint Generative Modeling of Complex Dynamics
Many, if not most, systems of interest in science are naturally described as nonlinear dynamical systems.
Beyond Trend and Periodicity: Guiding Time Series Forecasting with Textual Cues
This work introduces a novel Text-Guided Time Series Forecasting (TGTSF) task.
Multimodal Contextualized Semantic Parsing from Speech
We introduce Semantic Parsing in Contextual Environments (SPICE), a task designed to enhance artificial agents' contextual awareness by integrating multimodal inputs with prior contexts.
Heter-LP: A heterogeneous label propagation algorithm and its application in drug repositioning
As a result, it is necessary for drug development studies to conduct an investigation into the interrelationships of drugs, protein targets, and diseases.
Neuro-symbolic representation learning on biological knowledge graphs
Motivation: Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries.
A Unified Joint Matrix Factorization Framework for Data Integration
In this paper, we introduce a sparse multiple relationship data regularized joint matrix factorization (JMF) framework and two adapted prediction models for pattern recognition and data integration.