Text Clustering
37 papers with code • 3 benchmarks • 5 datasets
Grouping a set of texts in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). (Source: Adapted from Wikipedia)
Datasets
Most implemented papers
MTEB: Massive Text Embedding Benchmark
MTEB spans 8 embedding tasks covering a total of 58 datasets and 112 languages.
Dissimilarity Mixture Autoencoder for Deep Clustering
The dissimilarity mixture autoencoder (DMAE) is a neural network model for feature-based clustering that incorporates a flexible dissimilarity function and can be integrated into any kind of deep learning architecture.
Discovering New Intents with Deep Aligned Clustering
In this work, we propose an effective method, Deep Aligned Clustering, to discover new intents with the aid of the limited known intent data.
Supporting Clustering with Contrastive Learning
Unsupervised clustering aims at discovering the semantic categories of data according to some distance measured in the representation space.
Proposition-Level Clustering for Multi-Document Summarization
Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition.
Clustering Urdu News Using Headlines
This paper that proposes and evaluates a new algorithm to automatically cluster Urdu news from different news agencies.
Self-Taught Convolutional Neural Networks for Short Text Clustering
Short text clustering is a challenging problem due to its sparseness of text representation.
ELKI: A large open-source library for data analysis - ELKI Release 0.7.5 "Heidelberg"
We will first outline the motivation for this release, the plans for the future, and then give a brief overview over the new functionality in this version.
On the Use of ArXiv as a Dataset
We use this pipeline to extract and analyze a 6. 7 million edge citation graph, with an 11 billion word corpus of full-text research articles.