Text Clustering
23 papers with code • 2 benchmarks • 4 datasets
Grouping a set of texts in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). (Source: Adapted from Wikipedia)
Most implemented papers
Dissimilarity Mixture Autoencoder for Deep Clustering
The dissimilarity mixture autoencoder (DMAE) is a neural network model for feature-based clustering that incorporates a flexible dissimilarity function and can be integrated into any kind of deep learning architecture.
Discovering New Intents with Deep Aligned Clustering
In this work, we propose an effective method, Deep Aligned Clustering, to discover new intents with the aid of the limited known intent data.
Supporting Clustering with Contrastive Learning
Unsupervised clustering aims at discovering the semantic categories of data according to some distance measured in the representation space.
Proposition-Level Clustering for Multi-Document Summarization
Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition.
Self-Taught Convolutional Neural Networks for Short Text Clustering
Short text clustering is a challenging problem due to its sparseness of text representation.
On the Use of ArXiv as a Dataset
We use this pipeline to extract and analyze a 6. 7 million edge citation graph, with an 11 billion word corpus of full-text research articles.
A Self-Training Approach for Short Text Clustering
Short text clustering is a challenging problem when adopting traditional bag-of-words or TF-IDF representations, since these lead to sparse vector representations of the short texts.
Discovering New Intents via Constrained Deep Adaptive Clustering with Cluster Refinement
Identifying new user intents is an essential task in the dialogue system.
Neural Topic Modeling with Bidirectional Adversarial Training
Recent years have witnessed a surge of interests of using neural topic models for automatic topic extraction from text, since they avoid the complicated mathematical derivations for model inference as in traditional topic models such as Latent Dirichlet Allocation (LDA).