Text Clustering

32 papers with code • 3 benchmarks • 5 datasets

Grouping a set of texts in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). (Source: Adapted from Wikipedia)

Latest papers with no code

Text clustering applied to data augmentation in legal contexts

no code yet • 8 Apr 2024

Data analysis and machine learning are of preeminent importance in the legal domain, especially in tasks like clustering and text classification.

Text clustering with LLM embeddings

no code yet • 22 Mar 2024

Text clustering is an important approach for organising the growing amount of digital content, helping to structure and find hidden patterns in uncategorised data.

An enhanced Teaching-Learning-Based Optimization (TLBO) with Grey Wolf Optimizer (GWO) for text feature selection and clustering

no code yet • 19 Feb 2024

Text document clustering can play a vital role in organizing and handling the everincreasing number of text documents.

Automatic Construction of Multi-faceted User Profiles using Text Clustering and its Application to Expert Recommendation and Filtering Problems

no code yet • 19 Jan 2024

In this article, we tackle the problems of profile-based expert recommendation and document filtering from a machine learning perspective by clustering expert textual sources to build profiles and capture the different hidden topics in which the experts are interested.

Incremental hierarchical text clustering methods: a review

no code yet • 12 Dec 2023

Based on the relevance and contemporary nature of the field, this study aims to analyze various hierarchical and incremental clustering techniques; the main contribution of this research is the organization and comparison of the techniques used by studies published between 2010 and 2018 that aimed to texts documents clustering.

Federated Learning for Short Text Clustering

no code yet • 23 Nov 2023

The robust short text clustering module aims to train an effective short text clustering model with local data in each client.

LACoS-BLOOM: Low-rank Adaptation with Contrastive objective on 8 bits Siamese-BLOOM

no code yet • 10 May 2023

Third, we apply a Siamese architecture on BLOOM model with a contrastive objective to ease the multi-lingual labeled data scarcity.

CEIL: A General Classification-Enhanced Iterative Learning Framework for Text Clustering

no code yet • 20 Apr 2023

To address this issue, we propose CEIL, a novel Classification-Enhanced Iterative Learning framework for short text clustering, which aims at generally promoting the clustering performance by introducing a classification objective to iteratively improve feature representations.

AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models

no code yet • 14 Feb 2023

We also explore weight averaging of adapters trained on the same domain with different hyper-parameters, and show that it preserves the performance of a PLM on new domains while obtaining strong in-domain results.

ClusTop: An unsupervised and integrated text clustering and topic extraction framework

no code yet • 3 Jan 2023

Text clustering and topic extraction are two important tasks in text mining.