Large-scale spectral clustering using diffusion coordinates on landmark-based bipartite graphs

WS 2018  ·  Khiem Pham, Guangliang Chen ·

Spectral clustering has received a lot of attention due to its ability to separate nonconvex, non-intersecting manifolds, but its high computational complexity has significantly limited its applicability. Motivated by the document-term co-clustering framework by Dhillon (2001), we propose a landmark-based scalable spectral clustering approach in which we first use the selected landmark set and the given data to form a bipartite graph and then run a diffusion process on it to obtain a family of diffusion coordinates for clustering. We show that our proposed algorithm can be implemented based on very efficient operations on the affinity matrix between the given data and selected landmarks, thus capable of handling large data. Finally, we demonstrate the excellent performance of our method by comparing with the state-of-the-art scalable algorithms on several benchmark data sets.

PDF Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Image/Document Clustering pendigits LBDM Accuracy (%) 74.70 # 4
runtime (s) 3.08 # 7

Methods