Text Clustering
32 papers with code • 3 benchmarks • 5 datasets
Grouping a set of texts in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). (Source: Adapted from Wikipedia)
Datasets
Latest papers with no code
Text clustering applied to data augmentation in legal contexts
Data analysis and machine learning are of preeminent importance in the legal domain, especially in tasks like clustering and text classification.
Text clustering with LLM embeddings
Text clustering is an important approach for organising the growing amount of digital content, helping to structure and find hidden patterns in uncategorised data.
An enhanced Teaching-Learning-Based Optimization (TLBO) with Grey Wolf Optimizer (GWO) for text feature selection and clustering
Text document clustering can play a vital role in organizing and handling the everincreasing number of text documents.
Automatic Construction of Multi-faceted User Profiles using Text Clustering and its Application to Expert Recommendation and Filtering Problems
In this article, we tackle the problems of profile-based expert recommendation and document filtering from a machine learning perspective by clustering expert textual sources to build profiles and capture the different hidden topics in which the experts are interested.
Incremental hierarchical text clustering methods: a review
Based on the relevance and contemporary nature of the field, this study aims to analyze various hierarchical and incremental clustering techniques; the main contribution of this research is the organization and comparison of the techniques used by studies published between 2010 and 2018 that aimed to texts documents clustering.
Federated Learning for Short Text Clustering
The robust short text clustering module aims to train an effective short text clustering model with local data in each client.
LACoS-BLOOM: Low-rank Adaptation with Contrastive objective on 8 bits Siamese-BLOOM
Third, we apply a Siamese architecture on BLOOM model with a contrastive objective to ease the multi-lingual labeled data scarcity.
CEIL: A General Classification-Enhanced Iterative Learning Framework for Text Clustering
To address this issue, we propose CEIL, a novel Classification-Enhanced Iterative Learning framework for short text clustering, which aims at generally promoting the clustering performance by introducing a classification objective to iteratively improve feature representations.
AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models
We also explore weight averaging of adapters trained on the same domain with different hyper-parameters, and show that it preserves the performance of a PLM on new domains while obtaining strong in-domain results.
ClusTop: An unsupervised and integrated text clustering and topic extraction framework
Text clustering and topic extraction are two important tasks in text mining.