1 code implementation • 8 Oct 2021 • Serge Aleshin-Guendel, Mauricio Sadinle
Merging datafiles containing information on overlapping sets of entities is a challenging task in the absence of unique identifiers, and is further complicated when some entities are duplicated in the datafiles.
no code implementations • 24 Apr 2019 • Yen-Chi Chen, Mauricio Sadinle
Pattern-mixture models provide a transparent approach for handling missing data, where the full-data distribution is factorized in a way that explicitly shows the parts that can be estimated from observed data alone, and the parts that require identifying restrictions.
Methodology Statistics Theory Statistics Theory
no code implementations • 2 Sep 2016 • Mauricio Sadinle, Jing Lei, Larry Wasserman
In most classification tasks there are observations that are ambiguous and therefore difficult to correctly label.
1 code implementation • 25 Jan 2016 • Mauricio Sadinle
The bipartite record linkage task consists of merging two disparate datafiles containing information on two overlapping sets of entities.
no code implementations • 30 Jul 2014 • Mauricio Sadinle
Our Bayesian implementation allows us to incorporate prior information on the reliability of the fields in the data file, which is especially useful when no training data are available, and it also provides a proper account of the uncertainty in the duplicate detection decisions.
Applications Methodology
no code implementations • 11 Jul 2014 • Rebecca C. Steorts, Samuel L. Ventura, Mauricio Sadinle, Stephen E. Fienberg
Record linkage seeks to merge databases and to remove duplicates when unique identifiers are not available.
Databases Applications