Text Classification
1101 papers with code • 150 benchmarks • 148 datasets
Text Classification is the task of assigning a sentence or document an appropriate category. The categories depend on the chosen dataset and can range from topics.
Text Classification problems include emotion classification, news classification, citation intent classification, among others. Benchmark datasets for evaluating text classification capabilities include GLUE, AGNews, among others.
In recent years, deep learning techniques like XLNet and RoBERTa have attained some of the biggest performance jumps for text classification problems.
( Image credit: Text Classification Algorithms: A Survey )
Libraries
Use these libraries to find Text Classification models and implementationsSubtasks
- Topic Models
- Document Classification
- Sentence Classification
- Emotion Classification
- Emotion Classification
- Multi-Label Text Classification
- Few-Shot Text Classification
- Text Categorization
- Semi-Supervised Text Classification
- Coherence Evaluation
- Toxic Comment Classification
- Citation Intent Classification
- Cross-Domain Text Classification
- Unsupervised Text Classification
- Satire Detection
- Hierarchical Text Classification of Blurbs (GermEval 2019)
- Variable Detection
Most implemented papers
Universal Sentence Encoder
For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance.
XLNet: Generalized Autoregressive Pretraining for Language Understanding
With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling.
Unsupervised Data Augmentation for Consistency Training
In this work, we present a new perspective on how to effectively noise unlabeled examples and argue that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning.
RoFormer: Enhanced Transformer with Rotary Position Embedding
Then, we propose a novel method named Rotary Position Embedding(RoPE) to effectively leverage the positional information.
How to Fine-Tune BERT for Text Classification?
Language model pre-training has proven to be useful in learning universal language representations.
EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
We present EDA: easy data augmentation techniques for boosting performance on text classification tasks.
Parameter-Efficient Transfer Learning for NLP
On GLUE, we attain within 0. 4% of the performance of full fine-tuning, adding only 3. 6% parameters per task.
FNet: Mixing Tokens with Fourier Transforms
At longer input lengths, our FNet model is significantly faster: when compared to the "efficient" Transformers on the Long Range Arena benchmark, FNet matches the accuracy of the most accurate models, while outpacing the fastest models across all sequence lengths on GPUs (and across relatively shorter lengths on TPUs).
Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
Most tasks in natural language processing can be cast into question answering (QA) problems over language input.
Simple Recurrent Units for Highly Parallelizable Recurrence
Common recurrent neural architectures scale poorly due to the intrinsic difficulty in parallelizing their state computations.