Text Classification

1104 papers with code • 150 benchmarks • 148 datasets

Text Classification is the task of assigning a sentence or document an appropriate category. The categories depend on the chosen dataset and can range from topics.

Text Classification problems include emotion classification, news classification, citation intent classification, among others. Benchmark datasets for evaluating text classification capabilities include GLUE, AGNews, among others.

In recent years, deep learning techniques like XLNet and RoBERTa have attained some of the biggest performance jumps for text classification problems.

( Image credit: Text Classification Algorithms: A Survey )

Libraries

Use these libraries to find Text Classification models and implementations

PEACH: Pretrained-embedding Explanation Across Contextual and Hierarchical Structure

adlnlp/peach 21 Apr 2024

In this work, we propose a novel tree-based explanation technique, PEACH (Pretrained-embedding Explanation Across Contextual and Hierarchical Structure), that can explain how text-based documents are classified by using any pretrained contextual embeddings in a tree-based human-interpretable manner.

1
21 Apr 2024

IMO: Greedy Layer-Wise Sparse Representation Learning for Out-of-Distribution Text Classification with Pre-trained Models

williamstoto/imo 21 Apr 2024

Machine learning models have made incredible progress, but they still struggle when applied to examples from unseen domains.

1
21 Apr 2024

Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge

unimorph/umlabeller 20 Apr 2024

Our empirical findings show that the accuracy of UniMorph Labeller is 98%, and that, in all language models studied (including ALBERT, BERT, RoBERTa, and DeBERTa), alien tokenization leads to poorer generalizations compared to morphological tokenization for semantic compositionality of word meanings.

3
20 Apr 2024

VI-OOD: A Unified Representation Learning Framework for Textual Out-of-distribution Detection

faceonlive/ai-research 9 Apr 2024

Out-of-distribution (OOD) detection plays a crucial role in ensuring the safety and reliability of deep neural networks in various applications.

144
09 Apr 2024

Multi-Task Learning for Features Extraction in Financial Annual Reports

faceonlive/ai-research 8 Apr 2024

For assessing various performance indicators of companies, the focus is shifting from strictly financial (quantitative) publicly disclosed information to qualitative (textual) information.

144
08 Apr 2024

Exploring the Trade-off Between Model Performance and Explanation Plausibility of Text Classifiers Using Human Rationales

visual-ds/plausible-nlp-explanations 3 Apr 2024

By leveraging a multi-objective optimization algorithm, we explore the trade-off between the two loss functions and generate a Pareto-optimal frontier of models that balance performance and plausibility.

3
03 Apr 2024

Adaptive Cross-lingual Text Classification through In-Context One-Shot Demonstrations

villacu/ic_xlt 3 Apr 2024

Zero-Shot Cross-lingual Transfer (ZS-XLT) utilizes a model trained in a source language to make predictions in another language, often with a performance loss.

0
03 Apr 2024

FPT: Feature Prompt Tuning for Few-shot Readability Assessment

wzy232303/fpt 3 Apr 2024

Our proposed method establishes a new architecture for prompt tuning that sheds light on how linguistic features can be easily adapted to linguistic-related tasks.

0
03 Apr 2024

DiLM: Distilling Dataset into Language Model for Text-level Dataset Distillation

arumaekawa/dilm 30 Mar 2024

To address this issue, we propose a novel text dataset distillation approach, called Distilling dataset into Language Model (DiLM), which trains a language model to generate informative synthetic training samples as text data, instead of directly optimizing synthetic samples.

4
30 Mar 2024

HILL: Hierarchy-aware Information Lossless Contrastive Learning for Hierarchical Text Classification

rooooyy/hill 26 Mar 2024

Existing self-supervised methods in natural language processing (NLP), especially hierarchical text classification (HTC), mainly focus on self-supervised contrastive learning, extremely relying on human-designed augmentation rules to generate contrastive samples, which can potentially corrupt or distort the original information.

1
26 Mar 2024