Hierarchical correlation reconstruction with missing data, for example for biology-inspired neuron

17 Apr 2018 · Jarek Duda ·

Machine learning often needs to model density from a multidimensional data sample, including correlations between coordinates. Additionally, we often have missing data case: that data points can miss values for some of coordinates. This article adapts rapid parametric density estimation approach for this purpose: modelling density as a linear combination of orthonormal functions, for which $L^2$ optimization says that (independently) estimated coefficient for a given function is just average over the sample of value of this function. Hierarchical correlation reconstruction first models probability density for each separate coordinate using all its appearances in data sample, then adds corrections from independently modelled pairwise correlations using all samples having both coordinates, and so on independently adding correlations for growing numbers of variables using often decreasing evidence in data sample. A basic application of such modelled multidimensional density can be imputation of missing coordinates: by inserting known coordinates to the density, and taking expected values for the missing coordinates, or even their entire joint probability distribution. Presented method can be compared with cascade correlations approach, offering several advantages in flexibility and accuracy. It can be also used as artificial neuron: maximizing prediction capabilities for only local behavior - modelling and predicting local connections.

PDF Abstract