Git: Clustering Based on Graph of Intensity Topology
\textbf{A}ccuracy, \textbf{R}obustness to noises and scales, \textbf{I}nterpretability, \textbf{S}peed, and \textbf{E}asy to use (ARISE) are crucial requirements of a good clustering algorithm. However, achieving these goals simultaneously is challenging, and most advanced approaches only focus on parts of them. Towards an overall consideration of these aspects, we propose a novel clustering algorithm, namely GIT (Clustering Based on \textbf{G}raph of \textbf{I}ntensity \textbf{T}opology). GIT considers both local and global data structures: firstly forming local clusters based on intensity peaks of samples, and then estimating the global topological graph (topo-graph) between these local clusters. We use the Wasserstein Distance between the predicted and prior class proportions to automatically cut noisy edges in the topo-graph and merge connected local clusters as final clusters. Then, we compare GIT with seven competing algorithms on five synthetic datasets and nine real-world datasets. With fast local cluster detection, robust topo-graph construction and accurate edge-cutting, GIT shows attractive ARISE performance and significantly exceeds other non-convex clustering methods. For example, GIT outperforms its counterparts about $10\%$ (F1-score) on MNIST and FashionMNIST. Code is available at \color{red}{https://github.com/gaozhangyang/GIT}.
PDF AbstractCode
Datasets
Results from the Paper
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Clustering Algorithms Evaluation | Fashion-MNIST | AE+GIT | F1-score | 65% | # 1 | |
ARI | 49% | # 1 | ||||
NMI | 61% | # 1 | ||||
Clustering Algorithms Evaluation | Fashion-MNIST | GIT | F1-score | 56% | # 2 | |
ARI | 32% | # 4 | ||||
NMI | 51% | # 2 | ||||
Clustering Algorithms Evaluation | Fashion-MNIST | k-Means++ | F1-score | 39% | # 6 | |
ARI | 35% | # 2 | ||||
NMI | 51% | # 2 | ||||
Clustering Algorithms Evaluation | Fashion-MNIST | Spectral Clustering | F1-score | 43% | # 4 | |
ARI | 34% | # 3 | ||||
NMI | 49% | # 4 | ||||
Clustering Algorithms Evaluation | Fashion-MNIST | QuickShiftPP | F1-score | 42% | # 5 | |
ARI | 16% | # 6 | ||||
NMI | 41% | # 6 | ||||
Clustering Algorithms Evaluation | Fashion-MNIST | SpectACI | F1-score | 47% | # 3 | |
ARI | 29% | # 5 | ||||
NMI | 45% | # 5 | ||||
Clustering Algorithms Evaluation | MNIST | AE+GIT | F1-score | 88% | # 1 | |
ARI | 77% | # 1 | ||||
NMI | 81% | # 1 | ||||
Clustering Algorithms Evaluation | MNIST | k-Means++ | F1-score | 50% | # 3 | |
ARI | 36% | # 3 | ||||
NMI | 45% | # 3 | ||||
Clustering Algorithms Evaluation | MNIST | GIT | F1-score | 59% | # 2 | |
ARI | 42% | # 2 | ||||
NMI | 53% | # 2 | ||||
Clustering Algorithms Evaluation | MNIST | Spectral Clustering | F1-score | 41% | # 5 | |
ARI | 33% | # 4 | ||||
NMI | 44% | # 5 | ||||
Clustering Algorithms Evaluation | MNIST | QuickShiftPP | F1-score | 45% | # 4 | |
ARI | 13% | # 6 | ||||
NMI | 45% | # 3 | ||||
Clustering Algorithms Evaluation | MNIST | SpectACI | F1-score | 40% | # 6 | |
ARI | 17% | # 5 | ||||
NMI | 33% | # 6 | ||||
Clustering Algorithms Evaluation | Olivetti face | k-Means++ | F1-score | 52% | # 3 | |
NMI | 74% | # 3 | ||||
ARI | 38% | # 2 | ||||
Clustering Algorithms Evaluation | Olivetti face | Spectral Clustering | F1-score | 37% | # 4 | |
NMI | 66% | # 4 | ||||
ARI | 19% | # 5 | ||||
Clustering Algorithms Evaluation | Olivetti face | QuickShiftPP | F1-score | 60% | # 2 | |
NMI | 79% | # 1 | ||||
ARI | 38% | # 2 | ||||
Clustering Algorithms Evaluation | Olivetti face | SpectACI | F1-score | 34% | # 5 | |
NMI | 61% | # 5 | ||||
ARI | 21% | # 4 | ||||
Clustering Algorithms Evaluation | Olivetti face | GIT | F1-score | 62% | # 1 | |
NMI | 78% | # 2 | ||||
ARI | 45% | # 1 |