Learning the structure of Bayesian networks and causal relationships from observations is a common goal in several areas of science and technology.
One aim shared by multiple settings, such as continual learning or transfer learning, is to leverage previously acquired knowledge to converge faster on the current task.
Highly overparametrized neural networks can display curiously strong generalization performance - a phenomenon that has recently garnered a wealth of theoretical and empirical research in order to better understand it.
We present a lower-bound for the likelihood of this model and we show that optimizing this bound regularizes the model so that the Bhattacharyya distance between the bottom-up and top-down approximate distributions is minimized.
Neuroscientists have long criticised deep learning algorithms as incompatible with current knowledge of neurobiology.