1 code implementation • 10 Apr 2024 • Nima Shahbazi, Mahdi Erfanian, Abolfazl Asudeh, Fatemeh Nargesian, Divesh Srivastava
Entity matching is one the earliest tasks that occur in the big data pipeline and is alarmingly exposed to unintentional biases that affect the quality of data.
1 code implementation • 6 Jul 2023 • Nima Shahbazi, Nikola Danevski, Fatemeh Nargesian, Abolfazl Asudeh, Divesh Srivastava
Entity matching (EM) is a challenging problem studied by different communities for over half a century.
no code implementations • 10 Feb 2023 • Joel Rorseth, Parke Godfrey, Lukasz Golab, Mehdi Kargar, Divesh Srivastava, Jaroslaw Szlichta
Towards better explainability in the field of information retrieval, we present CREDENCE, an interactive tool capable of generating counterfactual explanations for document rankers.
no code implementations • 8 Nov 2022 • Cheryl Flynn, Aritra Guha, Subhabrata Majumdar, Divesh Srivastava, Zhengyi Zhou
New technologies and the availability of geospatial data have drawn attention to spatio-temporal biases present in society.
1 code implementation • 24 Mar 2022 • Tommaso Teofili, Donatella Firmani, Nick Koudas, Vincenzo Martello, Paolo Merialdo, Divesh Srivastava
CERTA builds on a probabilistic framework that aims at computing the explanations evaluating the outcomes produced by using perturbed copies of the input records.
1 code implementation • 27 Jan 2021 • Valter Crescenzi, Andrea De Angelis, Donatella Firmani, Maurizio Mazzei, Paolo Merialdo, Federico Piai, Divesh Srivastava
A limitation of such benchmarks is that they typically come with their own task definition and it can be difficult to leverage them for complex integration pipelines.
Entity Resolution Databases
no code implementations • 6 Jan 2021 • Reza Karegar, Parke Godfrey, Lukasz Golab, Mehdi Kargar, Divesh Srivastava, Jaroslaw Szlichta
Order dependencies (ODs) capture relationships between ordered domains of attributes.
Databases
1 code implementation • 5 Sep 2019 • Trong Duc Nguyen, Ming-Hung Shih, Sai Sree Parvathaneni, Bojian Xu, Divesh Srivastava, Srikanta Tirthapura
We consider random sampling for answering the ubiquitous class of group-by queries, which first group data according to one or more attributes, and then aggregate within each group after filtering through a predicate.
Databases Data Structures and Algorithms
no code implementations • Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data 2014 • Jun Zhang, Graham Cormode, Cecilia M. Procopiuc, Divesh Srivastava, Xiaokui Xiao
Given a dataset D, PRIVBAYES first constructs a Bayesian network N , which (i) provides a succinct model of the correlations among the attributes in D and (ii) allows us to approximate the distribution of data in D using a set P of lowdimensional marginals of D. After that, PRIVBAYES injects noise into each marginal in P to ensure differential privacy, and then uses the noisy marginals and the Bayesian network to construct an approximation of the data distribution in D. Finally, PRIVBAYES samples tuples from the approximate distribution to construct a synthetic dataset, and then releases the synthetic data.