Paper

AutoGMM: Automatic and Hierarchical Gaussian Mixture Modeling in Python

Background: Gaussian mixture modeling is a fundamental tool in clustering, as well as discriminant analysis and semiparametric density estimation. However, estimating the optimal model for any given number of components is an NP-hard problem, and estimating the number of components is in some respects an even harder problem. Findings: In R, a popular package called mclust addresses both of these problems. However, Python has lacked such a package. We therefore introduce AutoGMM, a Python algorithm for automatic Gaussian mixture modeling, and its hierarchical version, HGMM. AutoGMM builds upon scikit-learn's AgglomerativeClustering and GaussianMixture classes, with certain modifications to make the results more stable. Empirically, on several different applications, AutoGMM performs approximately as well as mclust, and sometimes better. Conclusions: AutoMM, a freely available Python package, enables efficient Gaussian mixture modeling by automatically selecting the initialization, number of clusters and covariance constraints.

Results in Papers With Code
(↓ scroll down to see all results)