Improving k-Means Clustering Performance with Disentangled Internal Representations

5 Jun 2020  ·  Abien Fred Agarap, Arnulfo P. Azcarraga ·

Deep clustering algorithms combine representation learning and clustering by jointly optimizing a clustering loss and a non-clustering loss. In such methods, a deep neural network is used for representation learning together with a clustering network. Instead of following this framework to improve clustering performance, we propose a simpler approach of optimizing the entanglement of the learned latent code representation of an autoencoder. We define entanglement as how close pairs of points from the same class or structure are, relative to pairs of points from different classes or structures. To measure the entanglement of data points, we use the soft nearest neighbor loss, and expand it by introducing an annealing temperature factor. Using our proposed approach, the test clustering accuracy was 96.2% on the MNIST dataset, 85.6% on the Fashion-MNIST dataset, and 79.2% on the EMNIST Balanced dataset, outperforming our baseline models.

PDF Abstract

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image Clustering EMNIST-Balanced AE+SNNL NMI 0.783 # 1
Accuracy 0.792 # 1
Image Clustering Fashion-MNIST AE+SNNL Accuracy 0.856 # 1
NMI 0.767 # 1
Image Clustering MNIST-test AE+SNNL NMI 0.903 # 8
Accuracy 0.962 # 6