Document Classification

207 papers with code • 19 benchmarks • 15 datasets

Document Classification is a procedure of assigning one or more labels to a document from a predetermined set of labels.

Source: Long-length Legal Document Classification

Benchmarks

Add a Result

These leaderboards are used to track progress in Document Classification

Dataset	Best Model	Compare
Reuters-21578	MPAD-path	See all
Cora	ACNet	See all
HOC	BioLinkBERT (large)	See all
BBCSport	MPAD-path	See all
Amazon	ApproxRepSet	See all
Twitter	ApproxRepSet	See all
WOS-5736	ConvTextTM	See all
IMDb-M	Document Classification Using Importance of Sentences	See all
AAPD	KD-LSTMreg	See all
Classic	REL-RWMD k-NN	See all
Recipe	ApproxRepSet	See all
SciDocs (MAG)	SciNCL	See all
SciDocs (MeSH)	SciNCL	See all
WOS-11967	RMDL (30 RDLs)	See all
WOS-46985	RMDL (30 RDLs)	See all
Yelp-14	KD-LSTMreg	See all
Reuters En-De	BilBOWA	See all
Reuters De-En	BilBOWA	See all
MPQA	MPAD-path	See all

Show all 19 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Document Classification models and implementations

huggingface/transformers

2 papers

124,889

sergioburdisso/pyss3

2 papers

331

eske/multivec

2 papers

116

IllinoisGraphBenchmark/IGB-Datasets

2 papers

See all 6 libraries.

Datasets

Subtasks

Page Stream Segmentation

Latest papers

Most implemented Social Latest No code

Visually Guided Generative Text-Layout Pre-training for Document Intelligence

veason-silverbullet/vitlp • 25 Mar 2024

Prior study shows that pre-training techniques can boost the performance of visual document understanding (VDU), which typically requires models to gain abilities to perceive and reason both document texts and layouts (e. g., locations of texts and table-cells).

25 Mar 2024

Paper
Code

NextLevelBERT: Investigating Masked Language Modeling with Higher-Level Representations for Long Documents

aiintelligentsystems/next-level-bert • • 27 Feb 2024

While (large) language models have significantly improved over the last years, they still struggle to sensibly process long sequences found, e. g., in books, due to the quadratic scaling of the underlying attention mechanism.

27 Feb 2024

Paper
Code

Prompted Contextual Vectors for Spear-Phishing Detection

nahmiasd/prompted-contextual-vectors-for-spear-phishing-detection • 13 Feb 2024

Spear-phishing attacks present a significant security challenge, with large language models (LLMs) escalating the threat by generating convincing emails and facilitating target reconnaissance.

13 Feb 2024

Paper
Code

ANLS* -- A Universal Document Processing Metric for Generative Large Language Models

deepopinion/anls_star_metric • 6 Feb 2024

However, evaluating GLLMs presents a challenge as the binary true or false evaluation used for discriminative models is not applicable to the predictions made by GLLMs.

06 Feb 2024

Paper
Code

GeoGalactica: A Scientific Large Language Model in Geoscience

geobrain-ai/geogalactica • • 31 Dec 2023

To our best knowledge, it is the largest language model for the geoscience domain.

31 Dec 2023

Paper
Code

MELO: Enhancing Model Editing with Neuron-Indexed Dynamic LoRA

bruthyu/melo • • 19 Dec 2023

Large language models (LLMs) have shown great success in various Natural Language Processing (NLP) tasks, whist they still need updates after deployment to fix errors or keep pace with the changing knowledge in the world.

19 Dec 2023

Paper
Code

Summarization-based Data Augmentation for Document Classification

etsurin/summaug • • 1 Dec 2023

Despite the prevalence of pretrained language models in natural language understanding tasks, understanding lengthy text such as document is still challenging due to the data sparseness problem.

01 Dec 2023

Paper
Code

SUT: a new multi-purpose synthetic dataset for Farsi document image analysis

aliiafkari/SUT_Dataset • 13th International Conference on Computer and Knowledge Engineering (ICCKE) 2023

This paper introduces a new large-scale dataset for Farsi document images, named SUT, which aims to tackle the challenges associated with obtaining diverse and substantial ground-truth data for supervised models in document image analysis (DIA) tasks, such as document image classification, text detection and recognition, and information retrieval.

27 Nov 2023

Paper
Code

ContraDoc: Understanding Self-Contradictions in Documents with Large Language Models

ddhruvkr/contradoc • • 15 Nov 2023

In recent times, large language models (LLMs) have shown impressive performance on various document-level tasks such as document classification, summarization, and question-answering.

15 Nov 2023

Paper
Code

Optimal Transport for Measures with Noisy Tree Metric

lttam/robustot-noisytreemetric • 20 Oct 2023

It is known that such OT problem (i. e., tree-Wasserstein (TW)) admits a closed-form expression, but depends fundamentally on the underlying tree structure over supports of input measures.

20 Oct 2023

Paper
Code

Document Classification

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result