Direct Learning with Guarantees of the Difference DAG Between Structural Equation Models
Discovering cause-effect relationships between variables from observational data is a fundamental challenge in many scientific disciplines. However, in many situations it is desirable to directly estimate the change in causal relationships across two different conditions, e.g., estimating the change in genetic expression across healthy and diseased subjects can help isolate genetic factors behind the disease. This paper focuses on the problem of directly estimating the structural difference between two structural equation models (SEMs), having the same topological ordering, given two sets of samples drawn from the individual SEMs. We present an principled algorithm that can recover the difference SEM in $\mathcal{O}(d^2 \log p)$ samples, where $d$ is related to the number of edges in the difference SEM of $p$ nodes. We also study the fundamental limits and show that any method requires at least $\Omega(d' \log \frac{p}{d'})$ samples to learn difference SEMs with at most $d'$ parents per node. Finally, we validate our theoretical results with synthetic experiments and show that our method outperforms the state-of-the-art. Moreover, we show the usefulness of our method by using data from the medical domain.
PDF Abstract