Text Classification

1102 papers with code • 150 benchmarks • 148 datasets

Text Classification is the task of assigning a sentence or document an appropriate category. The categories depend on the chosen dataset and can range from topics.

Text Classification problems include emotion classification, news classification, citation intent classification, among others. Benchmark datasets for evaluating text classification capabilities include GLUE, AGNews, among others.

In recent years, deep learning techniques like XLNet and RoBERTa have attained some of the biggest performance jumps for text classification problems.

( Image credit: Text Classification Algorithms: A Survey )

Libraries

Use these libraries to find Text Classification models and implementations

Latest papers with no code

When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes

no code yet • 18 Apr 2024

We present FastFit, a method, and a Python package design to provide fast and accurate few-shot classification, especially for scenarios with many semantically similar classes.

AI-Enhanced Cognitive Behavioral Therapy: Deep Learning and Large Language Models for Extracting Cognitive Pathways from Social Media Texts

no code yet • 17 Apr 2024

Cognitive Behavioral Therapy (CBT) is an effective technique for addressing the irrational thoughts stemming from mental illnesses, but it necessitates precise identification of cognitive pathways to be successfully implemented in patient care.

A Novel ICD Coding Framework Based on Associated and Hierarchical Code Description Distillation

no code yet • 17 Apr 2024

To address these problems, we propose a novel framework based on associated and hierarchical code description distillation (AHDD) for better code representation learning and avoidance of improper code assignment. we utilize the code description and the hierarchical structure inherent to the ICD codes.

Small Language Models are Good Too: An Empirical Study of Zero-Shot Classification

no code yet • 17 Apr 2024

This study is part of the debate on the efficiency of large versus small language models for text classification by prompting. We assess the performance of small language models in zero-shot text classification, challenging the prevailing dominance of large models. Across 15 datasets, our investigation benchmarks language models from 77M to 40B parameters using different architectures and scoring functions.

Incubating Text Classifiers Following User Instruction with Nothing but LLM

no code yet • 16 Apr 2024

In this paper, we aim to generate text classification data given arbitrary class definitions (i. e., user instruction), so one can train a small text classifier without any human annotation or raw corpus.

Quantization of Large Language Models with an Overdetermined Basis

no code yet • 15 Apr 2024

In this paper, we introduce an algorithm for data quantization based on the principles of Kashin representation.

OTTER: Improving Zero-Shot Classification via Optimal Transport

no code yet • 12 Apr 2024

Popular zero-shot models suffer due to artifacts inherited from pretraining.

VertAttack: Taking advantage of Text Classifiers' horizontal vision

no code yet • 12 Apr 2024

In contrast, humans are easily able to recognize and read words written both horizontally and vertically.

Exploring Contrastive Learning for Long-Tailed Multi-Label Text Classification

no code yet • 12 Apr 2024

In this paper, we conduct an in-depth study of supervised contrastive learning and its influence on representation in MLTC context.

Interactive Prompt Debugging with Sequence Salience

no code yet • 11 Apr 2024

We present Sequence Salience, a visual tool for interactive prompt debugging with input salience methods.