Text Clustering

32 papers with code • 3 benchmarks • 5 datasets

Grouping a set of texts in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). (Source: Adapted from Wikipedia)

Benchmarks

Add a Result

These leaderboards are used to track progress in Text Clustering

Dataset	Best Model	Compare
MTEB	ST5-XXL	See all
20 Newsgroups	G-BAT	See all
Urdu News Headlines Dataset	Vector Space Model	See all

Datasets

Subtasks

Latest papers with no code

Most implemented Social Latest No code

Text clustering applied to data augmentation in legal contexts

no code yet • 8 Apr 2024

Data analysis and machine learning are of preeminent importance in the legal domain, especially in tasks like clustering and text classification.

Paper
Add Code

Text clustering with LLM embeddings

no code yet • 22 Mar 2024

Text clustering is an important approach for organising the growing amount of digital content, helping to structure and find hidden patterns in uncategorised data.

Paper
Add Code

An enhanced Teaching-Learning-Based Optimization (TLBO) with Grey Wolf Optimizer (GWO) for text feature selection and clustering

no code yet • 19 Feb 2024

Text document clustering can play a vital role in organizing and handling the everincreasing number of text documents.

Paper
Add Code

Automatic Construction of Multi-faceted User Profiles using Text Clustering and its Application to Expert Recommendation and Filtering Problems

no code yet • 19 Jan 2024

In this article, we tackle the problems of profile-based expert recommendation and document filtering from a machine learning perspective by clustering expert textual sources to build profiles and capture the different hidden topics in which the experts are interested.

Paper
Add Code

Incremental hierarchical text clustering methods: a review

no code yet • 12 Dec 2023

Based on the relevance and contemporary nature of the field, this study aims to analyze various hierarchical and incremental clustering techniques; the main contribution of this research is the organization and comparison of the techniques used by studies published between 2010 and 2018 that aimed to texts documents clustering.

Paper
Add Code

Federated Learning for Short Text Clustering

no code yet • 23 Nov 2023

The robust short text clustering module aims to train an effective short text clustering model with local data in each client.

Paper
Add Code

LACoS-BLOOM: Low-rank Adaptation with Contrastive objective on 8 bits Siamese-BLOOM

no code yet • 10 May 2023

Third, we apply a Siamese architecture on BLOOM model with a contrastive objective to ease the multi-lingual labeled data scarcity.

Paper
Add Code

CEIL: A General Classification-Enhanced Iterative Learning Framework for Text Clustering

no code yet • 20 Apr 2023

To address this issue, we propose CEIL, a novel Classification-Enhanced Iterative Learning framework for short text clustering, which aims at generally promoting the clustering performance by introducing a classification objective to iteratively improve feature representations.

Paper
Add Code

AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models

no code yet • 14 Feb 2023

We also explore weight averaging of adapters trained on the same domain with different hyper-parameters, and show that it preserves the performance of a PLM on new domains while obtaining strong in-domain results.

Paper
Add Code

ClusTop: An unsupervised and integrated text clustering and topic extraction framework

no code yet • 3 Jan 2023

Text clustering and topic extraction are two important tasks in text mining.

Paper
Add Code

Text Clustering

Benchmarks Add a Result

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result