Entity Resolution
49 papers with code • 10 benchmarks • 11 datasets
Entity resolution (also known as entity matching, record linkage, or duplicate detection) is the task of finding records that refer to the same real-world entity across different data sources (e.g., data files, books, websites, and databases). (Source: Wikipedia)
Surveys on entity resolution:
-
Christophides et al.: End-to-End Entity Resolution for Big Data: A Survey, 2020.
-
Barlaug and Gulla: Neural Networks for Entity Matching: A Survey, 2021.
The task of entity resolution is closely related to the task of entity alignment which focuses on matching entities between knowledge bases. The task of entity linking differs from entity resolution as entity linking focuses on identifying entity mentions in free text.
Libraries
Use these libraries to find Entity Resolution models and implementationsDatasets
Latest papers
How to Evaluate Entity Resolution Systems: An Entity-Centric Framework with Application to Inventor Name Disambiguation
These benchmark data sets can then be used for model training and a variety of evaluation tasks.
Cost-Effective In-Context Learning for Entity Resolution: A Design Space Exploration
However, existing ICL approaches to ER typically necessitate providing a task description and a set of demonstrations for each entity pair and thus have limitations on the monetary cost of interfacing LLMs.
Entity Matching using Large Language Models
We show that for use cases that do not allow data to be shared with third parties, open-source LLMs can be a viable alternative to hosted LLMs given that a small amount of training data or matching knowledge...
A Critical Re-evaluation of Benchmark Datasets for (Deep) Learning-Based Matching Algorithms
Entity resolution (ER) is the process of identifying records that refer to the same entities within one or across multiple databases.
Using ChatGPT for Entity Matching
Always using the same set of 10 handpicked demonstrations leads to an improvement of 4. 92% over the zero-shot performance.
Unicorn: A Unified Multi-tasking Model for Supporting Matching Tasks in Data Integration
The widely used practice is to build task-specific or even dataset-specific solutions, which are hard to generalize and disable the opportunities of knowledge sharing that can be learned from different datasets and multiple tasks.
Pre-trained Embeddings for Entity Resolution: An Experimental Analysis [Experiment, Analysis & Benchmark]
This is applied to both main steps of ER, i. e., blocking and matching.
SC-Block: Supervised Contrastive Blocking within Entity Resolution Pipelines
To reduce these runtimes, entity resolution pipelines are constructed of two parts: a blocker that applies a computationally cheap method to select candidate record pairs, and a matcher that afterwards identifies matching pairs from this set using more expensive methods.
WDC Products: A Multi-Dimensional Entity Matching Benchmark
It also shows that for entity matching contrastive learning is more training data efficient compared to cross-encoders.
PIZZA: A new benchmark for complex end-to-end task-oriented parsing
Much recent work in task-oriented parsing has focused on finding a middle ground between flat slots and intents, which are inexpressive but easy to annotate, and powerful representations such as the lambda calculus, which are expressive but costly to annotate.