Search Results for author: Rebecca C. Steorts

Found 15 papers, 1 papers with code

A Primer on the Data Cleaning Pipeline

no code implementations25 Jul 2023 Rebecca C. Steorts

The availability of both structured and unstructured databases, such as electronic health data, social media data, patent data, and surveys that are often updated in real time, among others, has grown rapidly over the past decade.

Data Integration

(Almost) All of Entity Resolution

no code implementations10 Aug 2020 Olivier Binette, Rebecca C. Steorts

Whether the goal is to estimate the number of people that live in a congressional district, to estimate the number of individuals that have died in an armed conflict, or to disambiguate individual authors using bibliographic data, all these applications have a common theme - integrating information from multiple sources.

Clustering Entity Resolution

Random Partition Models for Microclustering Tasks

no code implementations4 Apr 2020 Brenda Betancourt, Giacomo Zanella, Rebecca C. Steorts

Motivated by these issues, we propose a general class of random partition models that satisfy the microclustering property with well-characterized theoretical properties.

Methodology Statistics Theory Statistics Theory

d-blink: Distributed End-to-End Bayesian Entity Resolution

4 code implementations13 Sep 2019 Neil G. Marchant, Andee Kaplan, Daniel N. Elazar, Benjamin I. P. Rubinstein, Rebecca C. Steorts

Entity resolution (ER; also known as record linkage or de-duplication) is the process of merging noisy databases, often in the absence of unique identifiers.

Blocking

Probabilistic Blocking with An Application to the Syrian Conflict

no code implementations11 Oct 2018 Rebecca C. Steorts, Anshumali Shrivastava

Entity resolution seeks to merge databases as to remove duplicate entries where unique identifiers are typically unknown.

Blocking Information Retrieval +1

A Practical Approach to Proper Inference with Linked Data

no code implementations2 Oct 2018 Andee Kaplan, Brenda Betancourt, Rebecca C. Steorts

Entity resolution (ER), comprising record linkage and de-duplication, is the process of merging noisy databases in the absence of unique identifiers to remove duplicate entities.

Entity Resolution

Performance Bounds for Graphical Record Linkage

no code implementations8 Mar 2017 Rebecca C. Steorts, Matt Barnes, Willie Neiswanger

Record linkage involves merging records in large, noisy databases to remove duplicate entities.

Clustering

Flexible Models for Microclustering with Application to Entity Resolution

no code implementations NeurIPS 2016 Giacomo Zanella, Brenda Betancourt, Hanna Wallach, Jeffrey Miller, Abbas Zaidi, Rebecca C. Steorts

Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points.

Clustering Entity Resolution

Bayesian Learning of Dynamic Multilayer Networks

no code implementations7 Aug 2016 Daniele Durante, Nabanita Mukherjee, Rebecca C. Steorts

Our formulation characterizes the edge probabilities as a function of shared and layer-specific actors positions in a latent space, with these positions changing in time via Gaussian processes.

Dimensionality Reduction Gaussian Processes

Microclustering: When the Cluster Sizes Grow Sublinearly with the Size of the Data Set

no code implementations2 Dec 2015 Jeffrey Miller, Brenda Betancourt, Abbas Zaidi, Hanna Wallach, Rebecca C. Steorts

Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points.

Clustering Entity Resolution

Variational Bayes for Merging Noisy Databases

no code implementations17 Oct 2014 Tamara Broderick, Rebecca C. Steorts

Bayesian entity resolution merges together multiple, noisy databases and returns the minimal collection of unique individuals represented, together with their true, latent record values.

Bayesian Inference Entity Resolution

Entity Resolution with Empirically Motivated Priors

no code implementations2 Sep 2014 Rebecca C. Steorts

Our extension to string-valued variables also involves the proposal of a new probabilistic mechanism by which observed record values for string fields can deviate from the values of their associated latent entities.

Methodology

A Comparison of Blocking Methods for Record Linkage

no code implementations11 Jul 2014 Rebecca C. Steorts, Samuel L. Ventura, Mauricio Sadinle, Stephen E. Fienberg

Record linkage seeks to merge databases and to remove duplicates when unique identifiers are not available.

Databases Applications

SMERED: A Bayesian Approach to Graphical Record Linkage and De-duplication

no code implementations2 Mar 2014 Rebecca C. Steorts, Rob Hall, Stephen E. Fienberg

We propose a novel unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files.

Computation Applications

A Bayesian Approach to Graphical Record Linkage and De-duplication

no code implementations17 Dec 2013 Rebecca C. Steorts, Rob Hall, Stephen E. Fienberg

We propose an unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files.

Methodology

Cannot find the paper you are looking for? You can Submit a new open access paper.