Text Clustering

32 papers with code • 3 benchmarks • 5 datasets

Grouping a set of texts in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). (Source: Adapted from Wikipedia)

Benchmarks

Add a Result

These leaderboards are used to track progress in Text Clustering

Dataset	Best Model	Compare
MTEB	ST5-XXL	See all
20 Newsgroups	G-BAT	See all
Urdu News Headlines Dataset	Vector Space Model	See all

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

Short Text Clustering via Convolutional Neural Networks

jacoxu/StackOverflow • • WS 2015

Paper
Code

Dissimilarity Mixture Autoencoder for Deep Clustering

larajuse/DMAE • • 15 Jun 2020

The dissimilarity mixture autoencoder (DMAE) is a neural network model for feature-based clustering that incorporates a flexible dissimilarity function and can be integrated into any kind of deep learning architecture.

Paper
Code

Discovering New Intents with Deep Aligned Clustering

thuiar/DeepAligned-Clustering • • 16 Dec 2020

In this work, we propose an effective method, Deep Aligned Clustering, to discover new intents with the aid of the limited known intent data.

Paper
Code

Supporting Clustering with Contrastive Learning

amazon-research/sccl • • NAACL 2021

Unsupervised clustering aims at discovering the semantic categories of data according to some distance measured in the representation space.

Paper
Code

Proposition-Level Clustering for Multi-Document Summarization

oriern/procluster • • NAACL 2022

Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition.

Paper
Code

MTEB: Massive Text Embedding Benchmark

embeddings-benchmark/mteb • • 13 Oct 2022

MTEB spans 8 embedding tasks covering a total of 58 datasets and 112 languages.

Paper
Code

Clustering Urdu News Using Headlines

SyedMuhammadFaheem/Urdu-News-Clustering • 23 2015

This paper that proposes and evaluates a new algorithm to automatically cluster Urdu news from different news agencies.

Paper
Code

Self-Taught Convolutional Neural Networks for Short Text Clustering

jacoxu/STC2 • • 1 Jan 2017

Short text clustering is a challenging problem due to its sparseness of text representation.

Paper
Code

ELKI: A large open-source library for data analysis - ELKI Release 0.7.5 "Heidelberg"

elki-project/elki • 10 Feb 2019

We will first outline the motivation for this release, the plans for the future, and then give a brief overview over the new functionality in this version.

Paper
Code

On the Use of ArXiv as a Dataset

mattbierbaum/arxiv-public-datasets • 30 Apr 2019

We use this pipeline to extract and analyze a 6. 7 million edge citation graph, with an 11 billion word corpus of full-text research articles.

Paper
Code

Text Clustering

Benchmarks Add a Result

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result