What do row and column marginals reveal about your dataset?

NeurIPS 2013  ·  Behzad Golshan, John Byers, Evimaria Terzi ·

Numerous datasets ranging from group memberships within social networks to purchase histories on e-commerce sites are represented by binary matrices. While this data is often either proprietary or sensitive, aggregated data, notably row and column marginals, is often viewed as much less sensitive, and may be furnished for analysis. Here, we investigate how these data can be exploited to make inferences about the underlying matrix H. Instead of assuming a generative model for H, we view the input marginals as constraints on the dataspace of possible realizations of H and compute the probability density function of particular entries H(i,j) of interest. We do this, for all the cells of H simultaneously, without generating realizations but rather via implicitly sampling the datasets that satisfy the input marginals. The end result is an efficient algorithm with running time equal to the time required by standard sampling techniques to generate a single dataset from the same dataspace. Our experimental evaluation demonstrates the efficiency and the efficacy of our framework in multiple settings.

PDF Abstract
No code implementations yet. Submit your code now



  Add Datasets introduced or used in this paper

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.


No methods listed for this paper. Add relevant methods here