Text Clustering
32 papers with code • 3 benchmarks • 5 datasets
Grouping a set of texts in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). (Source: Adapted from Wikipedia)
Datasets
Most implemented papers
A Self-Training Approach for Short Text Clustering
Short text clustering is a challenging problem when adopting traditional bag-of-words or TF-IDF representations, since these lead to sparse vector representations of the short texts.
Discovering New Intents via Constrained Deep Adaptive Clustering with Cluster Refinement
Identifying new user intents is an essential task in the dialogue system.
Enhancement of Short Text Clustering by Iterative Classification
Short text clustering is a challenging task due to the lack of signal contained in such short texts.
Neural Topic Modeling with Bidirectional Adversarial Training
Recent years have witnessed a surge of interests of using neural topic models for automatic topic extraction from text, since they avoid the complicated mathematical derivations for model inference as in traditional topic models such as Latent Dirichlet Allocation (LDA).
ComStreamClust: a communicative multi-agent approach to text clustering in streaming data
Topic detection is the task of determining and tracking hot topics in social media.
Efficient Sparse Spherical k-Means for Document Clustering
Spherical k-Means is frequently used to cluster document collections because it performs reasonably well in many settings and is computationally efficient.
Translation Transformers Rediscover Inherent Data Domains
Here we analyze the sentence representations learned by NMT Transformers and show that these explicitly include the information on text domains, even after only seeing the input sentences without domains labels.
Proposition-Level Clustering for Multi-Document Summarization
Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition.
Subspace Co-clustering with Two-Way Graph Convolution
We first extend the concept of subspace clustering to co-clustering, which has been extensively used on document-term matrices due to the resulting interplay between the document and term representations.
EASE: Entity-Aware Contrastive Learning of Sentence Embedding
We present EASE, a novel method for learning sentence embeddings via contrastive learning between sentences and their related entities.