TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Unsupervised Text Classification	20NewsGroups	Lbl2TransformerVec	F1-score	64,69	# 2
Unsupervised Text Classification	AG News	Lbl2TransformerVec	F1-score	83,79	# 1
Unsupervised Text Classification	Medical Abstracts	Lbl2TransformerVec	F1-score	56.46	# 1
Unsupervised Text Classification	Medical Abstracts	Lbl2Vec	F1-score	43.03	# 2
Unsupervised Text Classification	Yahoo! Answers	Lbl2TransformerVec	F1-score	55.84	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/evaluating-unsupervised-text-classification/unsupervised-text-classification-on-ag-news)](https://paperswithcode.com/sota/unsupervised-text-classification-on-ag-news?p=evaluating-unsupervised-text-classification)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/evaluating-unsupervised-text-classification/unsupervised-text-classification-on-medical)](https://paperswithcode.com/sota/unsupervised-text-classification-on-medical?p=evaluating-unsupervised-text-classification)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/evaluating-unsupervised-text-classification/unsupervised-text-classification-on-yahoo)](https://paperswithcode.com/sota/unsupervised-text-classification-on-yahoo?p=evaluating-unsupervised-text-classification)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/evaluating-unsupervised-text-classification/unsupervised-text-classification-on-1)](https://paperswithcode.com/sota/unsupervised-text-classification-on-1?p=evaluating-unsupervised-text-classification)`

Evaluating Unsupervised Text Classification: Zero-shot and Similarity-based Approaches

29 Nov 2022 · Tim Schopf, Daniel Braun, Florian Matthes ·

Text classification of unseen classes is a challenging Natural Language Processing task and is mainly attempted using two different types of approaches. Similarity-based approaches attempt to classify instances based on similarities between text document representations and class description representations. Zero-shot text classification approaches aim to generalize knowledge gained from a training task by assigning appropriate labels of unknown classes to text documents. Although existing studies have already investigated individual approaches to these categories, the experiments in literature do not provide a consistent comparison. This paper addresses this gap by conducting a systematic evaluation of different similarity-based and zero-shot approaches for text classification of unseen classes. Different state-of-the-art approaches are benchmarked on four text classification datasets, including a new dataset from the medical domain. Additionally, novel SimCSE and SBERT-based baselines are proposed, as other baselines used in existing work yield weak classification results and are easily outperformed. Finally, the novel similarity-based Lbl2TransformerVec approach is presented, which outperforms previous state-of-the-art approaches in unsupervised text classification. Our experiments show that similarity-based approaches significantly outperform zero-shot approaches in most cases. Additionally, using SimCSE or SBERT embeddings instead of simpler text representations increases similarity-based classification results even further.

PDF Abstract

Code

Add Remove Mark official

sebischair/lbl2vec official

168

sebischair/medical-abstracts-tc-cor… official

Tasks

Add Remove

Classification

text-classification

Text Classification

Unsupervised Text Classification

Zero-Shot Text Classification

Datasets

Introduced in the Paper:

Medical Abstracts

Used in the Paper:

AG News Yahoo! Answers 20NewsGroups AG’s Corpus

Results from the Paper

Add Remove

Ranked #1 on Unsupervised Text Classification on AG News (F1-score metric)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Unsupervised Text Classification	20NewsGroups	Lbl2TransformerVec	F1-score	64,69	# 2	Compare
Unsupervised Text Classification	AG News	Lbl2TransformerVec	F1-score	83,79	# 1	Compare
Unsupervised Text Classification	Medical Abstracts	Lbl2TransformerVec	F1-score	56.46	# 1	Compare
Unsupervised Text Classification	Medical Abstracts	Lbl2Vec	F1-score	43.03	# 2	Compare
Unsupervised Text Classification	Yahoo! Answers	Lbl2TransformerVec	F1-score	55.84	# 1	Compare

Methods

Add Remove

Lbl2TransformerVec • Lbl2Vec • SBERT • SimCSE • Skip-gram Word2Vec

Edit Social Preview

Evaluating Unsupervised Text Classification: Zero-shot and Similarity-based Approaches

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove