AutoGMM: Automatic and Hierarchical Gaussian Mixture Modeling in Python

6 Sep 2019  ·  Thomas L. Athey, Tingshan Liu, Benjamin D. Pedigo, Joshua T. Vogelstein ·

Background: Gaussian mixture modeling is a fundamental tool in clustering, as well as discriminant analysis and semiparametric density estimation. However, estimating the optimal model for any given number of components is an NP-hard problem, and estimating the number of components is in some respects an even harder problem. Findings: In R, a popular package called mclust addresses both of these problems. However, Python has lacked such a package. We therefore introduce AutoGMM, a Python algorithm for automatic Gaussian mixture modeling, and its hierarchical version, HGMM. AutoGMM builds upon scikit-learn's AgglomerativeClustering and GaussianMixture classes, with certain modifications to make the results more stable. Empirically, on several different applications, AutoGMM performs approximately as well as mclust, and sometimes better. Conclusions: AutoMM, a freely available Python package, enables efficient Gaussian mixture modeling by automatically selecting the initialization, number of clusters and covariance constraints.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here