TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Short Text Clustering	Biomedical	STC2-LE	Acc	43.62	# 3
Short Text Clustering	Biomedical	STC2-LPI	Acc	43	# 4
Short Text Clustering	Searchsnippets	STC2-LPI	Acc	77.01	# 4
Short Text Clustering	Searchsnippets	STC2-LE	Acc	77.09	# 3
Short Text Clustering	Stackoverflow	Deep ECIC	Acc	STC2-LE	# 2
Short Text Clustering	Stackoverflow	Deep ECIC	Acc	STC2-LPI	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/self-taught-convolutional-neural-networks-for/short-text-clustering-on-stackoverflow)](https://paperswithcode.com/sota/short-text-clustering-on-stackoverflow?p=self-taught-convolutional-neural-networks-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/self-taught-convolutional-neural-networks-for/short-text-clustering-on-biomedical)](https://paperswithcode.com/sota/short-text-clustering-on-biomedical?p=self-taught-convolutional-neural-networks-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/self-taught-convolutional-neural-networks-for/short-text-clustering-on-searchsnippets)](https://paperswithcode.com/sota/short-text-clustering-on-searchsnippets?p=self-taught-convolutional-neural-networks-for)`

Self-Taught Convolutional Neural Networks for Short Text Clustering

1 Jan 2017 · Jiaming Xu, Peng Wang, Suncong Zheng, Guanhua Tian, Jun Zhao, Bo Xu ·

Short text clustering is a challenging problem due to its sparseness of text representation. Here we propose a flexible Self-Taught Convolutional neural network framework for Short Text Clustering (dubbed STC^2), which can flexibly and successfully incorporate more useful semantic features and learn non-biased deep text representation in an unsupervised manner. In our framework, the original raw text features are firstly embedded into compact binary codes by using one existing unsupervised dimensionality reduction methods. Then, word embeddings are explored and fed into convolutional neural networks to learn deep feature representations, meanwhile the output units are used to fit the pre-trained binary codes in the training process. Finally, we get the optimal clusters by employing K-means to cluster the learned representations. Extensive experimental results demonstrate that the proposed framework is effective, flexible and outperform several popular clustering methods when tested on three public short text datasets.

PDF Abstract

Code

Add Remove Mark official

jacoxu/STC2 official

115

Tasks

Add Remove

Clustering

Dimensionality Reduction

Short Text Clustering

Text Clustering

Word Embeddings

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Edit

Ranked #2 on Short Text Clustering on Stackoverflow

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Short Text Clustering	Biomedical	STC2-LE	Acc	43.62	# 3	Compare
Short Text Clustering	Biomedical	STC2-LPI	Acc	43	# 4	Compare
Short Text Clustering	Searchsnippets	STC2-LPI	Acc	77.01	# 4	Compare
Short Text Clustering	Searchsnippets	STC2-LE	Acc	77.09	# 3	Compare
Short Text Clustering	Stackoverflow	Deep ECIC	Acc	STC2-LE	# 2	Compare
Short Text Clustering	Stackoverflow	Deep ECIC	Acc	STC2-LPI	# 2	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Self-Taught Convolutional Neural Networks for Short Text Clustering

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove