Merging joint distributions via causal model classes with low VC dimension

9 Apr 2018  ·  Dominik Janzing ·

If $X,Y,Z$ denote sets of random variables, two different data sources may contain samples from $P_{X,Y}$ and $P_{Y,Z}$, respectively. We argue that causal inference can help inferring properties of the 'unobserved joint distributions' $P_{X,Y,Z}$ or $P_{X,Z}$. The properties may be conditional independences or also quantitative statements about dependences. More generally, we define a learning scenario where the input is a subset of variables and the label is some statistical property of that subset. Sets of jointly observed variables define the training points, while unobserved sets are possible test points. To solve this learning task, we infer, as an intermediate step, a causal model from the observations that then entails properties of unobserved sets. Accordingly, we can define the VC dimension of a class of causal models and derive generalization bounds for the predictions. Here, causal inference becomes more modest and better accessible to empirical tests than usual: rather than trying to find a causal hypothesis that is 'true' (which is a problematic term when it is unclear how to define interventions) a causal hypothesis is useful whenever it correctly predicts statistical properties of unobserved joint distributions. Within such a 'pragmatic' application of causal inference, some popular heuristic approaches become justified in retrospect. It is, for instance, allowed to infer DAGs from partial correlations instead of conditional independences if the DAGs are only used to predict partial correlations. I hypothesize that our pragmatic view on causality may even cover the usual meaning in terms of interventions and sketch why predicting the impact of interventions can sometimes also be phrased as a task of the above type.

PDF Abstract
No code implementations yet. Submit your code now

Categories


Statistics Theory Statistics Theory