Entity Resolution

50 papers with code • 10 benchmarks • 11 datasets

Entity resolution (also known as entity matching, record linkage, or duplicate detection) is the task of finding records that refer to the same real-world entity across different data sources (e.g., data files, books, websites, and databases). (Source: Wikipedia)

Surveys on entity resolution:

The task of entity resolution is closely related to the task of entity alignment which focuses on matching entities between knowledge bases. The task of entity linking differs from entity resolution as entity linking focuses on identifying entity mentions in free text.

Libraries

Use these libraries to find Entity Resolution models and implementations

Subtasks


Latest papers with no code

Combining Global and Local Merges in Logic-based Entity Resolution

no code yet • 26 May 2023

In the recently proposed Lace framework for collective entity resolution, logical rules and constraints are used to identify pairs of entity references (e. g. author or paper ids) that denote the same entity.

Beyond Rule-based Named Entity Recognition and Relation Extraction for Process Model Generation from Natural Language Text

no code yet • 6 May 2023

We propose an extension to the PET dataset that incorporates information about linguistic references and a corresponding method for resolving them.

A Framework for Combining Entity Resolution and Query Answering in Knowledge Bases

no code yet • 13 Mar 2023

We propose a new framework for combining entity resolution and query answering in knowledge bases (KBs) with tuple-generating dependencies (tgds) and equality-generating dependencies (egds) as rules.

Another Generic Setting for Entity Resolution: Basic Theory

no code yet • 12 Mar 2023

They treated the functions for matching and merging entity records as black-boxes and introduced four important properties that enable efficient generic ER algorithms.

KAER: A Knowledge Augmented Pre-Trained Language Model for Entity Resolution

no code yet • 12 Jan 2023

Entity resolution has been an essential and well-studied task in data cleaning research for decades.

Introducing Semantics into Speech Encoders

no code yet • 15 Nov 2022

Recent studies find existing self-supervised speech encoders contain primarily acoustic rather than semantic information.

Bridging the Gap between Reality and Ideality of Entity Matching: A Revisiting and Benchmark Re-Construction

no code yet • 12 May 2022

Experimental results demonstrate that the assumptions made in the previous benchmark construction process are not coincidental with the open environment, which conceal the main challenges of the task and therefore significantly overestimate the current progress of entity matching.

A Survey on Efficient Processing of Similarity Queries over Neural Embeddings

no code yet • 17 Apr 2022

Embedding techniques work by representing the raw data objects as vectors (so called "embeddings" or "neural embeddings" since they are mostly generated by neural network models) that expose the hidden semantics of the raw data, based on which embeddings do show outstanding effectiveness on capturing data similarities, making it one of the most widely used and studied techniques in the state-of-the-art similarity query processing research.

Why the Rich Get Richer? On the Balancedness of Random Partition Models

no code yet • 30 Jan 2022

Random partition models are widely used in Bayesian methods for various clustering tasks, such as mixture models, topic models, and community detection problems.