Intent Detection and Discovery from User Logs via Deep Semi-Supervised Contrastive Clustering

Intent Detection is a crucial component of Dialogue Systems wherein the objective is to classify a user utterance into one of multiple pre-defined intents. A pre-requisite for developing an effective intent identifier is a training dataset labeled with all possible user intents. However, even skilled domain experts are often unable to foresee all possible user intents at design time and for practical applications, novel intents may have to be inferred incrementally on-the-fly from user utterances. Therefore, for any real-world dialogue system, the number of intents increases over time and new intents have to be discovered by analyzing the utterances outside the existing set of intents. In this paper, our objective is to i) detect known intent utterances from a large number of unlabeled utterance samples given a few labeled samples and ii) discover new unknown intents from the remaining unlabeled samples. Existing SOTA approaches address this problem via alternate representation learning and clustering wherein pseudo labels are used for updating the representations and clustering is used for generating the pseudo labels. Unlike existing approaches that rely on epoch wise cluster alignment, we propose an end-to-end deep contrastive clustering algorithm that jointly updates model parameters and cluster centers via supervised and self-supervised learning and optimally utilizes both labeled and unlabeled data. Our proposed approach outperforms competitive baselines on five public datasets for both settings: (i) where the number of undiscovered intents are known in advance, and (ii) where the number of intents are estimated by an algorithm. We also propose a human-in-the-loop variant of our approach for practical deployment which does not require an estimate of new intents and outperforms the end-to-end approach.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Open Intent Discovery BANKING77 DSSCC NMI 0.8124 # 1
ARI 0.5809 # 1
ACC 69.82 # 1
Open Intent Discovery CLINC150 DSSCC NMI 0.9387 # 2
ARI 0.8109 # 1
ACC 87.91 # 1
Open Intent Discovery DBpedia DSSCC Clustering Accuracy 92.73 # 1
Open Intent Discovery SNIPS DSSCC NMI 90.44 # 1
ARI 89.03 # 1
ACC 94.87 # 1
Open Intent Discovery Stackoverflow DSSCC NMI 77.08 # 1
ARI 68.67 # 1
ACC 82.65 # 1

Methods


No methods listed for this paper. Add relevant methods here