Text Classification

1107 papers with code • 93 benchmarks • 136 datasets

Text Classification is the task of assigning a sentence or document an appropriate category. The categories depend on the chosen dataset and can range from topics.

Text Classification problems include emotion classification, news classification, citation intent classification, among others. Benchmark datasets for evaluating text classification capabilities include GLUE, AGNews, among others.

In recent years, deep learning techniques like XLNet and RoBERTa have attained some of the biggest performance jumps for text classification problems.

( Image credit: Text Classification Algorithms: A Survey )

Benchmarks

Add a Result

These leaderboards are used to track progress in Text Classification

Dataset	Best Model	Compare
MTEB	ST5-XXL	See all
AG News	XLNet	See all
DBpedia	XLNet	See all
R8	DeBERTa	See all
TREC-6	Automatic Label Error Correction	See all
20NEWS	LinearSVM+TFIDF	See all
IMDb	BERT-ITPT-FiT	See all
Ohsumed	RoBERTaGCN	See all
Yahoo! Answers	BERT-ITPT-FiT	See all
MR	VLAWE	See all
R52	1-6 BertGCN	See all
NewsDiscourse	Human (Post-Rec.) (Spangher et al., 2021)	See all
Yelp-5	HAHNN (CNN)	See all
DODF Data	ULMFiT (pre-trained vocab, no gradual unfreezing)	See all
MVICTOR (type)	CNN + CRF	See all
SVICTOR (type)	CNN + CRF	See all
WeeBit (Readability Assessment)	BERT-FP-LBL	See all
OneStopEnglish (Readability Assessment)	RoBERTa-RF-T1 hybrid	See all
Lot-insts	Character-BERT+RS	See all
Yelp-2	XLNet	See all
Amazon-2	XLNet	See all
RCV1	One-hot CNN+ Johnson & Zhang ([2016b])	See all
arXiv-10	Protoformer	See all
HateXplain	Space-XLNet	See all
Sogou News	BERT-ITPT-FiT	See all
Amazon-5	XLNet	See all
Overruling	Custom Legal-BERT	See all
Terms of Service	Custom Legal-BERT	See all
BLURB	BioLinkBERT (large)	See all
Twitter	ERNIE 2.0	See all
IMDb Movie Reviews	Logistic Regression	See all
TREC-50	Rules	See all
An Amharic News Text classification Dataset	Naive Bayes using Tf-idf features	See all
GLUE SST2	TRANS-BLSTM	See all
MuLD (Character Type)	Longformer	See all
Searchsnippets	DistilBERT	See all
SST-2	DeBERTa	See all
This is not a Dataset	Vicuna13B v1.1	See all
Social media attributions of YouTube comments	Space-BERT	See all
TRAC2-Benghali. Task 2.	BERT	See all
TRAC2-English. Task2.	BERT	See all
AffCon 2020 Emotion Detection	BERT-based Ensembles	See all
Arxiv HEP-TH citation graph	BigBird	See all
Patents	BigBird	See all
Hyperpartisan News Detection	BigBird	See all
Facebook Media	Our proposed method Model Averaging(D + E + F)	See all
Twitter-US	Our proposed method Model Averaging(D + E + F)	See all
RusAge: Corpus for Age-Based Text Classification	LSVC + linguistic features + publishing attributes	See all
20 Newsgroups	RoBERTaGCN	See all
SILICONE Benchmark	Pretrained Hierarchical Transformer	See all
WNUT-2020 Task 2	NutCracker	See all
GLUE MRPC	TRANS-BLSTM	See all
GLUE COLA	TRANS-BLSTM	See all
GLUE RTE	TRANS-BLSTM	See all
GLUE STSB	TRANS-BLSTM	See all
BANKING77	RoBERTa-Large + ICDA	See all
Adverse Drug Events (ADE) Corpus	Spark NLP	See all
TREC-10	BERT	See all
NICE-45	BERT	See all
NICE-2	RoBERTa	See all
STOPS-41	DeBERTa	See all
STOPS-2	ERNIE 2.0	See all
Twitter Sentiment Analysis	Logistic Regression	See all
FMC-MWO2KG	Flair	See all

Show all 92 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Text Classification models and implementations

huggingface/transformers

9 papers

125,425

Datasets

Subtasks

Emotion Classification

Multi-Label Text Classification

Few-Shot Text Classification

Text Categorization

Semi-Supervised Text Classification

Coherence Evaluation

Toxic Comment Classification

Citation Intent Classification

Cross-Domain Text Classification

Unsupervised Text Classification

Satire Detection

Hierarchical Text Classification of Blurbs (GermEval 2019)

Variable Detection

Latest papers

Most implemented Social Latest No code

DiLM: Distilling Dataset into Language Model for Text-level Dataset Distillation

arumaekawa/dilm • • 30 Mar 2024

To address this issue, we propose a novel text dataset distillation approach, called Distilling dataset into Language Model (DiLM), which trains a language model to generate informative synthetic training samples as text data, instead of directly optimizing synthetic samples.

30 Mar 2024

Paper
Code

HILL: Hierarchy-aware Information Lossless Contrastive Learning for Hierarchical Text Classification

rooooyy/hill • • 26 Mar 2024

Existing self-supervised methods in natural language processing (NLP), especially hierarchical text classification (HTC), mainly focus on self-supervised contrastive learning, extremely relying on human-designed augmentation rules to generate contrastive samples, which can potentially corrupt or distort the original information.

26 Mar 2024

Paper
Code

LlamBERT: Large-scale low-cost data annotation in NLP

aielte-research/llambert • 23 Mar 2024

Large Language Models (LLMs), such as GPT-4 and Llama 2, show remarkable proficiency in a wide range of natural language processing (NLP) tasks.

23 Mar 2024

Paper
Code

A Model Ensemble Approach with LLM for Chinese Text Classification

swaggy66/Chinese-Text-Classification • China Health Information Processing Conference, 2023 2024

Automatic medical text categorization can assist doctors in efficiently managing patient information.

22 Mar 2024

Paper
Code

SpikeGraphormer: A High-Performance Graph Transformer with Spiking Graph Attention

phd-lanyu/spikegraphormer • • 21 Mar 2024

In this work, we propose a novel insight into integrating SNNs with Graph Transformers and design a Spiking Graph Attention (SGA) module.

21 Mar 2024

Paper
Code

SynerMix: Synergistic Mixup Solution for Enhanced Intra-Class Cohesion and Inter-Class Separability in Image Classification

wxitxy/complementary_intra-class_and_inter-class_mixup • • 21 Mar 2024

It also surpasses the top-performer of either Manifold MixUp or SynerMix-Intra by 0. 12% to 5. 16%, with an average gain of 1. 11%.

21 Mar 2024

Paper
Code

Investigating Text Shortening Strategy in BERT: Truncation vs Summarization

mirzaalimm/truncationvssummarization • 19 Mar 2024

In this study, we investigate the performance of document truncation and summarization in text classification tasks.

19 Mar 2024

Paper
Code

Team Trifecta at Factify5WQA: Setting the Standard in Fact Verification with Fine-Tuning

andychiangsh/pre-cofactv3 • • 15 Mar 2024

In this paper, we present Pre-CoFactv3, a comprehensive framework comprised of Question Answering and Text Classification components for fact verification.

15 Mar 2024

Paper
Code

Defending Against Unforeseen Failure Modes with Latent Adversarial Training

thestephencasper/latent_adversarial_training • • 8 Mar 2024

Despite extensive diagnostics and debugging by developers, AI systems sometimes exhibit harmful unintended behaviors.

08 Mar 2024

Paper
Code

RulePrompt: Weakly Supervised Text Classification with Prompting PLMs and Self-Iterative Logical Rules

miaomiaoli2/ruleprompt • • 5 Mar 2024

Weakly supervised text classification (WSTC), also called zero-shot or dataless text classification, has attracted increasing attention due to its applicability in classifying a mass of texts within the dynamic and open Web environment, since it requires only a limited set of seed words (label names) for each category instead of labeled data.

05 Mar 2024

Paper
Code

Text Classification

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result