Short Text Clustering
14 papers with code • 8 benchmarks • 2 datasets
Latest papers
Robust Representation Learning with Reliable Pseudo-labels Generation via Self-Adaptive Optimal Transport for Short Text Clustering
To tackle the above issues, we propose a Robust Short Text Clustering (RSTC) model to improve robustness against imbalanced and noisy data.
Twin Contrastive Learning for Online Clustering
Specifically, we find that when the data is projected into a feature space with a dimensionality of the target cluster number, the rows and columns of its feature matrix correspond to the instance and cluster representation, respectively.
EASE: Entity-Aware Contrastive Learning of Sentence Embedding
We present EASE, a novel method for learning sentence embeddings via contrastive learning between sentences and their related entities.
DECAF: Deep Extreme Classification with Label Features
This paper develops the DECAF algorithm that addresses these challenges by learning models enriched by label metadata that jointly learn model parameters and feature representations using deep networks and offer accurate classification at the scale of millions of labels.
ECLARE: Extreme Classification with Label Graph Correlations
This paper presents ECLARE, a scalable deep learning architecture that incorporates not only label text, but also label correlations, to offer accurate real-time predictions within a few milliseconds.
Efficient Sparse Spherical k-Means for Document Clustering
Spherical k-Means is frequently used to cluster document collections because it performs reasonably well in many settings and is computationally efficient.
Supporting Clustering with Contrastive Learning
Unsupervised clustering aims at discovering the semantic categories of data according to some distance measured in the representation space.
Discovering New Intents with Deep Aligned Clustering
In this work, we propose an effective method, Deep Aligned Clustering, to discover new intents with the aid of the limited known intent data.
Intent Mining from past conversations for conversational agent
In this paper, we present an intent discovery framework that involves 4 primary steps: Extraction of textual utterances from a conversation using a pre-trained domain agnostic Dialog Act Classifier (Data Extraction), automatic clustering of similar user utterances (Clustering), manual annotation of clusters with an intent label (Labeling) and propagation of intent labels to the utterances from the previous step, which are not mapped to any cluster (Label Propagation); to generate intent training data from raw conversations.
Enhancement of Short Text Clustering by Iterative Classification
Short text clustering is a challenging task due to the lack of signal contained in such short texts.