Supporting Clustering with Contrastive Learning

Unsupervised clustering aims at discovering the semantic categories of data according to some distance measured in the representation space. However, different categories often overlap with each other in the representation space at the beginning of the learning process, which poses a significant challenge for distance-based clustering in achieving good separation between different categories. To this end, we propose Supporting Clustering with Contrastive Learning (SCCL) -- a novel framework to leverage contrastive learning to promote better separation. We assess the performance of SCCL on short text clustering and show that SCCL significantly advances the state-of-the-art results on most benchmark datasets with 3%-11% improvement on Accuracy and 4%-15% improvement on Normalized Mutual Information. Furthermore, our quantitative analysis demonstrates the effectiveness of SCCL in leveraging the strengths of both bottom-up instance discrimination and top-down clustering to achieve better intra-cluster and inter-cluster distances when evaluated with the ground truth cluster labels.

PDF Abstract NAACL 2021 PDF NAACL 2021 Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Short Text Clustering AG News SCCL Acc 88.2 # 1
Short Text Clustering Biomedical SCCL Acc 46.2 # 1
Short Text Clustering GoogleNews-S SCCL Acc 83.1 # 1
Short Text Clustering GoogleNews-T SCCL Acc 75.8 # 1
Short Text Clustering GoogleNews-TS SCCL Acc 89.8 # 1
Short Text Clustering Searchsnippets SCCL Acc 85.2 # 1
Short Text Clustering Stackoverflow Deep ECIC Accuracy SCCL # 1
Acc SCCL # 1
Short Text Clustering Tweet SCCL Acc 78.2 # 1

Methods