Divide-and-conquer based Large-Scale Spectral Clustering
Spectral clustering is one of the most popular clustering methods. However, how to balance the efficiency and effectiveness of the large-scale spectral clustering with limited computing resources has not been properly solved for a long time. In this paper, we propose a divide-and-conquer based large-scale spectral clustering method to strike a good balance between efficiency and effectiveness. In the proposed method, a divide-and-conquer based landmark selection algorithm and a novel approximate similarity matrix approach are designed to construct a sparse similarity matrix within low computational complexities. Then clustering results can be computed quickly through a bipartite graph partition process. The proposed method achieves a lower computational complexity than most existing large-scale spectral clustering. Experimental results on ten large-scale datasets have demonstrated the efficiency and effectiveness of the proposed methods. The MATLAB code of the proposed method and experimental datasets are available at https://github.com/Li- Hongmin/MyPaperWithCode.
PDF AbstractTask | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Image/Document Clustering | pendigits | DnC-SC | Accuracy (%) | 82.27 | # 1 | |
runtime (s) | 0.64 | # 1 | ||||
NMI | 82.86 | # 1 | ||||
Image Clustering | pendigits | DnC-SC | Accuracy | 0.8201 | # 2 | |
NMI | 0.8201 | # 2 | ||||
Image Clustering | USPS | DnC-SC | NMI | 0.8286 | # 14 | |
Accuracy | 0.8255 | # 12 |