Text Clustering

15 papers with code • 1 benchmarks • 2 datasets

Grouping a set of texts in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). (Source: Adapted from Wikipedia)

Most implemented papers

Dissimilarity Mixture Autoencoder for Deep Clustering

larajuse/DMAE 15 Jun 2020

The dissimilarity mixture autoencoder (DMAE) is a neural network model for feature-based clustering that incorporates a flexible dissimilarity function and can be integrated into any kind of deep learning architecture.

Discovering New Intents with Deep Aligned Clustering

thuiar/DeepAligned-Clustering 16 Dec 2020

In this work, we propose an effective method, Deep Aligned Clustering, to discover new intents with the aid of the limited known intent data.

Supporting Clustering with Contrastive Learning

amazon-research/sccl NAACL 2021

Unsupervised clustering aims at discovering the semantic categories of data according to some distance measured in the representation space.

Self-Taught Convolutional Neural Networks for Short Text Clustering

jacoxu/STC2 1 Jan 2017

Short text clustering is a challenging problem due to its sparseness of text representation.

On the Use of ArXiv as a Dataset

mattbierbaum/arxiv-public-datasets 30 Apr 2019

We use this pipeline to extract and analyze a 6. 7 million edge citation graph, with an 11 billion word corpus of full-text research articles.

A Self-Training Approach for Short Text Clustering

hadifar/stc_clustering WS 2019

Short text clustering is a challenging problem when adopting traditional bag-of-words or TF-IDF representations, since these lead to sparse vector representations of the short texts.

Discovering New Intents via Constrained Deep Adaptive Clustering with Cluster Refinement

thuiar/CDAC-plus 20 Nov 2019

Identifying new user intents is an essential task in the dialogue system.

Neural Topic Modeling with Bidirectional Adversarial Training

zll17/Neural_Topic_Models ACL 2020

Recent years have witnessed a surge of interests of using neural topic models for automatic topic extraction from text, since they avoid the complicated mathematical derivations for model inference as in traditional topic models such as Latent Dirichlet Allocation (LDA).

ComStreamClust: a communicative multi-agent approach to text clustering in streaming data

AliNajafi1998/ComStream 11 Oct 2020

Topic detection is the task of determining and tracking hot topics in social media.