Entity Resolution
26 papers with code • 7 benchmarks • 7 datasets
Entity resolution (also known as entity matching, record linkage, or duplicate detection) is the task of finding records that refer to the same real-world entity across different data sources (e.g., data files, books, websites, and databases). (Source: Wikipedia)
Surveys on entity resolution:
-
Vassilis et al.: End-to-End Entity Resolution for Big Data: A Survey, 2020.
-
Barlaug and Gulla: Neural Networks for Entity Matching: A Survey, 2021.
The task of entity resolution is closely related to the task of entity alignment which focuses on matching entities between knowledge bases. The task of entity linking differs from entity resolution as entity linking focuses on identifying entity mentions in free text.
Most implemented papers
d-blink: Distributed End-to-End Bayesian Entity Resolution
Entity resolution (ER; also known as record linkage or de-duplication) is the process of merging noisy databases, often in the absence of unique identifiers.
Intermediate Training of BERT for Product Matching
Adding the masked language modeling objective in the intermediate training step in order to further adapt the language model to the application domain leads to an additional increase of up to 3% F1.
A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching
We report its performance on candidate selection in the context of the downstream task of toponym resolution, both on existing datasets and on a new manually-annotated resource of nineteenth-century English OCR'd text.
In Search of an Entity Resolution OASIS: Optimal Asymptotic Sequential Importance Sampling
Entity resolution (ER) presents unique challenges for evaluation methodology.
Deep Learning for Entity Matching: A Design Space Exploration
Entity matching (EM) finds data instances that refer to the same real-world entity.
Learning Text Representations for 500K Classification Tasks on Named Entity Disambiguation
Named Entity Disambiguation algorithms typically learn a single model for all target entities.
Crowdsourcing and Aggregating Nested Markable Annotations
One of the key steps in language resource creation is the identification of the text segments to be annotated, or markables, which depending on the task may vary from nominal chunks for named entity resolution to (potentially nested) noun phrases in coreference resolution (or mentions) to larger text segments in text segmentation.
Optimal Transport-based Alignment of Learned Character Representations for String Similarity
We evaluate STANCE's ability to detect whether two strings can refer to the same entity--a task we term alias detection.
ZeroER: Entity Resolution using Zero Labeled Examples
We investigate an important problem that vexes practitioners: is it possible to design an effective algorithm for ER that requires Zero labeled examples, yet can achieve performance comparable to supervised approaches?
Accelerating Column Generation via Flexible Dual Optimal Inequalities with Application to Entity Resolution
We tackle optimization of weighted set packing by relaxing integrality in our ILP formulation.