Text Clustering

32 papers with code • 3 benchmarks • 5 datasets

Grouping a set of texts in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). (Source: Adapted from Wikipedia)

Most implemented papers

A Self-Training Approach for Short Text Clustering

hadifar/stc_clustering WS 2019

Short text clustering is a challenging problem when adopting traditional bag-of-words or TF-IDF representations, since these lead to sparse vector representations of the short texts.

Discovering New Intents via Constrained Deep Adaptive Clustering with Cluster Refinement

thuiar/CDAC-plus 20 Nov 2019

Identifying new user intents is an essential task in the dialogue system.

Enhancement of Short Text Clustering by Iterative Classification

rashadulrakib/short-text-clustering-enhancement 31 Jan 2020

Short text clustering is a challenging task due to the lack of signal contained in such short texts.

Neural Topic Modeling with Bidirectional Adversarial Training

zll17/Neural_Topic_Models ACL 2020

Recent years have witnessed a surge of interests of using neural topic models for automatic topic extraction from text, since they avoid the complicated mathematical derivations for model inference as in traditional topic models such as Latent Dirichlet Allocation (LDA).

ComStreamClust: a communicative multi-agent approach to text clustering in streaming data

AliNajafi1998/ComStream 11 Oct 2020

Topic detection is the task of determining and tracking hot topics in social media.

Efficient Sparse Spherical k-Means for Document Clustering

johpro/esp-kmeans 30 Jul 2021

Spherical k-Means is frequently used to cluster document collections because it performs reasonably well in many settings and is computationally efficient.

Translation Transformers Rediscover Inherent Data Domains

tartunlp/inherent-domains-wmt21 WMT (EMNLP) 2021

Here we analyze the sentence representations learned by NMT Transformers and show that these explicitly include the information on text domains, even after only seeing the input sentences without domains labels.

Proposition-Level Clustering for Multi-Document Summarization

oriern/procluster ACL ARR January 2022

Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition.

Subspace Co-clustering with Two-Way Graph Convolution

chakib401/SC3 CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management 2022

We first extend the concept of subspace clustering to co-clustering, which has been extensively used on document-term matrices due to the resulting interplay between the document and term representations.

EASE: Entity-Aware Contrastive Learning of Sentence Embedding

studio-ousia/ease NAACL 2022

We present EASE, a novel method for learning sentence embeddings via contrastive learning between sentences and their related entities.