no code implementations • 25 Jul 2023 • Rebecca C. Steorts
The availability of both structured and unstructured databases, such as electronic health data, social media data, patent data, and surveys that are often updated in real time, among others, has grown rapidly over the past decade.
no code implementations • 10 Aug 2020 • Olivier Binette, Rebecca C. Steorts
Whether the goal is to estimate the number of people that live in a congressional district, to estimate the number of individuals that have died in an armed conflict, or to disambiguate individual authors using bibliographic data, all these applications have a common theme - integrating information from multiple sources.
no code implementations • 4 Apr 2020 • Brenda Betancourt, Giacomo Zanella, Rebecca C. Steorts
Motivated by these issues, we propose a general class of random partition models that satisfy the microclustering property with well-characterized theoretical properties.
Methodology Statistics Theory Statistics Theory
4 code implementations • 13 Sep 2019 • Neil G. Marchant, Andee Kaplan, Daniel N. Elazar, Benjamin I. P. Rubinstein, Rebecca C. Steorts
Entity resolution (ER; also known as record linkage or de-duplication) is the process of merging noisy databases, often in the absence of unique identifiers.
no code implementations • 11 Oct 2018 • Rebecca C. Steorts, Anshumali Shrivastava
Entity resolution seeks to merge databases as to remove duplicate entries where unique identifiers are typically unknown.
no code implementations • 2 Oct 2018 • Andee Kaplan, Brenda Betancourt, Rebecca C. Steorts
Entity resolution (ER), comprising record linkage and de-duplication, is the process of merging noisy databases in the absence of unique identifiers to remove duplicate entities.
no code implementations • 8 Mar 2017 • Rebecca C. Steorts, Matt Barnes, Willie Neiswanger
Record linkage involves merging records in large, noisy databases to remove duplicate entities.
no code implementations • NeurIPS 2016 • Giacomo Zanella, Brenda Betancourt, Hanna Wallach, Jeffrey Miller, Abbas Zaidi, Rebecca C. Steorts
Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points.
no code implementations • 7 Aug 2016 • Daniele Durante, Nabanita Mukherjee, Rebecca C. Steorts
Our formulation characterizes the edge probabilities as a function of shared and layer-specific actors positions in a latent space, with these positions changing in time via Gaussian processes.
no code implementations • 2 Dec 2015 • Jeffrey Miller, Brenda Betancourt, Abbas Zaidi, Hanna Wallach, Rebecca C. Steorts
Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points.
no code implementations • 17 Oct 2014 • Tamara Broderick, Rebecca C. Steorts
Bayesian entity resolution merges together multiple, noisy databases and returns the minimal collection of unique individuals represented, together with their true, latent record values.
no code implementations • 2 Sep 2014 • Rebecca C. Steorts
Our extension to string-valued variables also involves the proposal of a new probabilistic mechanism by which observed record values for string fields can deviate from the values of their associated latent entities.
Methodology
no code implementations • 11 Jul 2014 • Rebecca C. Steorts, Samuel L. Ventura, Mauricio Sadinle, Stephen E. Fienberg
Record linkage seeks to merge databases and to remove duplicates when unique identifiers are not available.
Databases Applications
no code implementations • 2 Mar 2014 • Rebecca C. Steorts, Rob Hall, Stephen E. Fienberg
We propose a novel unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files.
Computation Applications
no code implementations • 17 Dec 2013 • Rebecca C. Steorts, Rob Hall, Stephen E. Fienberg
We propose an unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files.
Methodology