Our approach is simulation efficient by simultaneously estimating low-dimensional marginal posteriors instead of the joint posterior and by proposing simulations targeted to an observation of interest via a prior suitably truncated by an indicator function.
We argue that the particular choice of coordinatization should not affect a network's inference -- it should be coordinate independent.
We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data.
Argmax Flows are defined by a composition of a continuous distribution (such as a normalizing flow), and an argmax function.
Efficient gradient computation of the Jacobian determinant term is a core problem in many machine learning settings, and especially so in the normalizing flow framework.
Modern neural networks, although achieving state-of-the-art results on many tasks, tend to have a large number of parameters, which increases training time and resource usage.
With the introduction of SNIP [arXiv:1810. 02340v2], it has been demonstrated that modern neural networks can effectively be pruned before training.
We argue that causal concepts can be used to explain the success of data augmentation by describing how they can weaken the spurious correlation between the observed domains and the task labels.
This enables us to identify superfluous information as that not shared by both views.
Unfortunately, this research has primarily focused on distributions defined in Euclidean space, ruling out the usage of one of the most influential class of spaces with non-trivial topologies: Lie groups.
We prove the main rules of causal calculus (also called do-calculus) for i/o structural causal models (ioSCMs), a generalization of a recently proposed general class of non-/linear structural causal models that allow for cycles, latent confounders and arbitrary probability distributions.
We show that minimizing the p-Wasserstein distance between the generator and the true data distribution is equivalent to the unconstrained min-min optimization of the p-Wasserstein distance between the encoder aggregated posterior and the prior in latent space, plus a reconstruction error.
Our experiments show that choosing manifold-valued latent variables that match the topology of the latent data manifold, is crucial to preserve the topological structure and learn a well-behaved latent space.
We address the problem of causal discovery from data, making use of the recently proposed causal modeling framework of modular structural causal models (mSCM) to handle cycles, latent confounders and non-linearities.
In this paper, we investigate SCMs in a more general setting, allowing for the presence of both latent confounders and cycles.