Entity Resolution

50 papers with code • 10 benchmarks • 11 datasets

Entity resolution (also known as entity matching, record linkage, or duplicate detection) is the task of finding records that refer to the same real-world entity across different data sources (e.g., data files, books, websites, and databases). (Source: Wikipedia)

Surveys on entity resolution:

The task of entity resolution is closely related to the task of entity alignment which focuses on matching entities between knowledge bases. The task of entity linking differs from entity resolution as entity linking focuses on identifying entity mentions in free text.

Benchmarks

Add a Result

These leaderboards are used to track progress in Entity Resolution

Dataset	Best Model	Compare
Amazon-Google	gpt4-0613_fewshot-10	See all
Abt-Buy	gpt4-0613_zeroshot	See all
WDC Computers-small	BERT	See all
WDC Computers-xlarge	RoBERTa-SupCon	See all
WDC Products-80%cc-seen-medium	gpt4-0613_zeroshot	See all
WDC Watches-small	HG	See all
WDC Products-50%cc-unseen-medium	RoBERTa-base	See all
WDC Watches-xlarge	JointBERT	See all
MusicBrainz20K	ALMSER-GB	See all
WDC Products-80%cc-seen-medium-multi	RoBERTa-SupCon	See all

Libraries

Use these libraries to find Entity Resolution models and implementations

megagonlabs/rotom

2 papers

Datasets

Subtasks

Blocking

Most implemented papers

Most implemented Social Latest No code

d-blink: Distributed End-to-End Bayesian Entity Resolution

ngmarchant/dblink • 13 Sep 2019

Entity resolution (ER; also known as record linkage or de-duplication) is the process of merging noisy databases, often in the absence of unique identifiers.

Paper
Code

Intermediate Training of BERT for Product Matching

weyoun2211/productbert-intermediate • • DI2KG: International Workshop on Challenges and Experiences from Data Integration to Knowledge Graphs @ VLDB 2020 2020

Adding the masked language modeling objective in the intermediate training step in order to further adapt the language model to the application domain leads to an additional increase of up to 3% F1.

Paper
Code

A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching

Living-with-machines/DeezyMatch • • 17 Sep 2020

We report its performance on candidate selection in the context of the downstream task of toponym resolution, both on existing datasets and on a new manually-annotated resource of nineteenth-century English OCR'd text.

Paper
Code

Can Foundation Models Wrangle Your Data?

hazyresearch/fm_data_tasks • 20 May 2022

Foundation Models (FMs) are models trained on large corpora of data that, at very large scale, can generalize to new tasks without any task-specific finetuning.

Paper
Code

Estimating the Performance of Entity Resolution Algorithms: Lessons Learned Through PatentsView.org

patentsview/patentsview-evaluation • 3 Oct 2022

This paper introduces a novel evaluation methodology for entity resolution algorithms.

Paper
Code

PIZZA: A new benchmark for complex end-to-end task-oriented parsing

amazon-science/pizza-semantic-parsing-dataset • 1 Dec 2022

Much recent work in task-oriented parsing has focused on finding a middle ground between flat slots and intents, which are inexpressive but easy to annotate, and powerful representations such as the lambda calculus, which are expressive but costly to annotate.

Paper
Code

How to Evaluate Entity Resolution Systems: An Entity-Centric Framework with Application to Inventor Name Disambiguation

olivierbinette/er-evaluation • 8 Apr 2024

These benchmark data sets can then be used for model training and a variety of evaluation tasks.

Paper
Code

Towards Universal Dense Blocking for Entity Resolution

tshu-w/uniblocker • • 23 Apr 2024

Blocking is a critical step in entity resolution, and the emergence of neural network-based representation models has led to the development of dense blocking as a promising approach for exploring deep semantics in blocking.

Paper
Code