Text Clustering

32 papers with code • 3 benchmarks • 5 datasets

Grouping a set of texts in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). (Source: Adapted from Wikipedia)

Benchmarks

Add a Result

These leaderboards are used to track progress in Text Clustering

Dataset	Best Model	Compare
MTEB	ST5-XXL	See all
20 Newsgroups	G-BAT	See all
Urdu News Headlines Dataset	Vector Space Model	See all

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

A Self-Training Approach for Short Text Clustering

hadifar/stc_clustering • • WS 2019

Short text clustering is a challenging problem when adopting traditional bag-of-words or TF-IDF representations, since these lead to sparse vector representations of the short texts.

Paper
Code

Discovering New Intents via Constrained Deep Adaptive Clustering with Cluster Refinement

thuiar/CDAC-plus • • 20 Nov 2019

Identifying new user intents is an essential task in the dialogue system.

Paper
Code

Enhancement of Short Text Clustering by Iterative Classification

rashadulrakib/short-text-clustering-enhancement • 31 Jan 2020

Short text clustering is a challenging task due to the lack of signal contained in such short texts.

Paper
Code

Neural Topic Modeling with Bidirectional Adversarial Training

zll17/Neural_Topic_Models • • ACL 2020

Recent years have witnessed a surge of interests of using neural topic models for automatic topic extraction from text, since they avoid the complicated mathematical derivations for model inference as in traditional topic models such as Latent Dirichlet Allocation (LDA).

Paper
Code

ComStreamClust: a communicative multi-agent approach to text clustering in streaming data

AliNajafi1998/ComStream • 11 Oct 2020

Topic detection is the task of determining and tracking hot topics in social media.

Paper
Code

Efficient Sparse Spherical k-Means for Document Clustering

johpro/esp-kmeans • 30 Jul 2021

Spherical k-Means is frequently used to cluster document collections because it performs reasonably well in many settings and is computationally efficient.

Paper
Code

Translation Transformers Rediscover Inherent Data Domains

tartunlp/inherent-domains-wmt21 • WMT (EMNLP) 2021

Here we analyze the sentence representations learned by NMT Transformers and show that these explicitly include the information on text domains, even after only seeing the input sentences without domains labels.

Paper
Code

Proposition-Level Clustering for Multi-Document Summarization

oriern/procluster • • ACL ARR January 2022

Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition.

Paper
Code

Subspace Co-clustering with Two-Way Graph Convolution

chakib401/SC3 • • CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management 2022

We first extend the concept of subspace clustering to co-clustering, which has been extensively used on document-term matrices due to the resulting interplay between the document and term representations.

Paper
Code

EASE: Entity-Aware Contrastive Learning of Sentence Embedding

studio-ousia/ease • • NAACL 2022

We present EASE, a novel method for learning sentence embeddings via contrastive learning between sentences and their related entities.

Paper
Code

Text Clustering

Benchmarks Add a Result

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result