TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Text Classification	20NEWS	SCDV-MS	Accuracy	86.19	# 11
Text Classification	20NEWS	SCDV-MS	F-measure	86.16	# 4
Text Classification	20NEWS	SCDV-MS	Precision	86.2	# 2
Text Classification	20NEWS	SCDV-MS	Recall	86.18	# 2
Document Classification	Reuters-21578	SCDV-MS	F1	82.71	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/improving-document-classification-with-multi/document-classification-on-reuters-21578)](https://paperswithcode.com/sota/document-classification-on-reuters-21578?p=improving-document-classification-with-multi)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/improving-document-classification-with-multi/text-classification-on-20news)](https://paperswithcode.com/sota/text-classification-on-20news?p=improving-document-classification-with-multi)`

Improving Document Classification with Multi-Sense Embeddings

18 Nov 2019 · Vivek Gupta, Ankit Saw, Pegah Nokhiz, Harshit Gupta, Partha Talukdar ·

Efficient representation of text documents is an important building block in many NLP tasks. Research on long text categorization has shown that simple weighted averaging of word vectors for sentence representation often outperforms more sophisticated neural models. Recently proposed Sparse Composite Document Vector (SCDV) (Mekala et. al, 2017) extends this approach from sentences to documents using soft clustering over word vectors. However, SCDV disregards the multi-sense nature of words, and it also suffers from the curse of higher dimensionality. In this work, we address these shortcomings and propose SCDV-MS. SCDV-MS utilizes multi-sense word embeddings and learns a lower dimensional manifold. Through extensive experiments on multiple real-world datasets, we show that SCDV-MS embeddings outperform previous state-of-the-art embeddings on multi-class and multi-label text categorization tasks. Furthermore, SCDV-MS embeddings are more efficient than SCDV in terms of time and space complexity on textual classification tasks.

PDF Abstract

Code

Add Remove Mark official

vgupta123/SCDV-MS official

Tasks

Add Remove

Classification

Clustering

Document Classification

General Classification

Sentence

Text Categorization

Word Embeddings

Datasets

Reuters-21578

Results from the Paper

Edit

Ranked #5 on Document Classification on Reuters-21578 (F1 metric)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Text Classification	20NEWS	SCDV-MS	Accuracy	86.19	# 11	Compare
			F-measure	86.16	# 4	Compare
			Precision	86.2	# 2	Compare
			Recall	86.18	# 2	Compare
Document Classification	Reuters-21578	SCDV-MS	F1	82.71	# 5	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Improving Document Classification with Multi-Sense Embeddings

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove