Short Text Clustering
12 papers with code • 8 benchmarks • 1 datasets
In this work, we propose an effective method, Deep Aligned Clustering, to discover new intents with the aid of the limited known intent data.
Short text clustering is a challenging problem when adopting traditional bag-of-words or TF-IDF representations, since these lead to sparse vector representations of the short texts.
In this paper, we present an intent discovery framework that involves 4 primary steps: Extraction of textual utterances from a conversation using a pre-trained domain agnostic Dialog Act Classifier (Data Extraction), automatic clustering of similar user utterances (Clustering), manual annotation of clusters with an intent label (Labeling) and propagation of intent labels to the utterances from the previous step, which are not mapped to any cluster (Label Propagation); to generate intent training data from raw conversations.
Spherical k-Means is frequently used to cluster document collections because it performs reasonably well in many settings and is computationally efficient.
This paper presents ECLARE, a scalable deep learning architecture that incorporates not only label text, but also label correlations, to offer accurate real-time predictions within a few milliseconds.