Text Clustering

32 papers with code • 3 benchmarks • 5 datasets

Grouping a set of texts in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). (Source: Adapted from Wikipedia)

Benchmarks

Add a Result

These leaderboards are used to track progress in Text Clustering

Dataset	Best Model	Compare
MTEB	ST5-XXL	See all
20 Newsgroups	G-BAT	See all
Urdu News Headlines Dataset	Vector Space Model	See all

Datasets

Subtasks

Latest papers

Most implemented Social Latest No code

More Discriminative Sentence Embeddings via Semantic Graph Smoothing

chakib401/smoothing_sentence_embeddings • • 20 Feb 2024

This paper explores an empirical approach to learn more discriminantive sentence representations in an unsupervised fashion.

20 Feb 2024

Paper
Code

Elastic deep autoencoder for text embedding clustering by an improved graph regularization

safinal/text-embedding-clustering • Expert System with application journal 2023

In this jointly end-to-end deep learning model, better representation and text clustering results are achieved with high accuracy on common datasets compared to existing methods.

23 Sep 2023

Paper
Code

Large Language Models Enable Few-Shot Clustering

viswavi/few-shot-clustering • • 2 Jul 2023

In this paper, we ask whether a large language model can amplify an expert's guidance to enable query-efficient, few-shot semi-supervised text clustering.

02 Jul 2023

Paper
Code

ClusterLLM: Large Language Models as a Guide for Text Clustering

zhang-yu-wei/clusterllm • • 24 May 2023

First, we prompt ChatGPT for insights on clustering perspective by constructing hard triplet questions <does A better correspond to B than C>, where A, B and C are similar data points that belong to different clusters according to small embedder.

24 May 2023

Paper
Code

Robust Representation Learning with Reliable Pseudo-labels Generation via Self-Adaptive Optimal Transport for Short Text Clustering

hmllmh/rstc • • 23 May 2023

To tackle the above issues, we propose a Robust Short Text Clustering (RSTC) model to improve robustness against imbalanced and noisy data.

23 May 2023

Paper
Code

Influence of various text embeddings on clustering performance in NLP

simpleparadox/cmput_697_project • • 4 May 2023

For example, a three star rating (out of five) may be incongruous with the review text, which may be more suitable for a five star review.

04 May 2023

Paper
Code

DeepLens: Interactive Out-of-distribution Data Detection in NLP Models

momentum-lab-workspace/deeplens • • 2 Mar 2023

In this work, we propose DeepLens, an interactive system that helps users detect and explore OOD issues in massive text corpora.

02 Mar 2023

Paper
Code

Very Large Language Model as a Unified Methodology of Text Mining

jonjoncardoso/jonjoncardoso • 19 Dec 2022

Text data mining is the process of deriving essential information from language text.

19 Dec 2022

Paper
Code

MTEB: Massive Text Embedding Benchmark

embeddings-benchmark/mteb • • 13 Oct 2022

MTEB spans 8 embedding tasks covering a total of 58 datasets and 112 languages.

1,377

13 Oct 2022

Paper
Code

Training Effective Neural Sentence Encoders from Automatically Mined Paraphrases

sdadas/polish-sentence-evaluation • • 26 Jul 2022

Our sentence encoder can be trained in less than a day on a single graphics card, achieving high performance on a diverse set of sentence-level tasks.

26 Jul 2022

Paper
Code

Text Clustering

Benchmarks Add a Result

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result