Document Classification

207 papers with code • 19 benchmarks • 15 datasets

Document Classification is a procedure of assigning one or more labels to a document from a predetermined set of labels.

Source: Long-length Legal Document Classification

Libraries

Use these libraries to find Document Classification models and implementations

Visually Guided Generative Text-Layout Pre-training for Document Intelligence

veason-silverbullet/vitlp 25 Mar 2024

Prior study shows that pre-training techniques can boost the performance of visual document understanding (VDU), which typically requires models to gain abilities to perceive and reason both document texts and layouts (e. g., locations of texts and table-cells).

14
25 Mar 2024

NextLevelBERT: Investigating Masked Language Modeling with Higher-Level Representations for Long Documents

aiintelligentsystems/next-level-bert 27 Feb 2024

While (large) language models have significantly improved over the last years, they still struggle to sensibly process long sequences found, e. g., in books, due to the quadratic scaling of the underlying attention mechanism.

4
27 Feb 2024

Prompted Contextual Vectors for Spear-Phishing Detection

nahmiasd/prompted-contextual-vectors-for-spear-phishing-detection 13 Feb 2024

Spear-phishing attacks present a significant security challenge, with large language models (LLMs) escalating the threat by generating convincing emails and facilitating target reconnaissance.

1
13 Feb 2024

ANLS* -- A Universal Document Processing Metric for Generative Large Language Models

deepopinion/anls_star_metric 6 Feb 2024

However, evaluating GLLMs presents a challenge as the binary true or false evaluation used for discriminative models is not applicable to the predictions made by GLLMs.

9
06 Feb 2024

GeoGalactica: A Scientific Large Language Model in Geoscience

geobrain-ai/geogalactica 31 Dec 2023

To our best knowledge, it is the largest language model for the geoscience domain.

6
31 Dec 2023

MELO: Enhancing Model Editing with Neuron-Indexed Dynamic LoRA

bruthyu/melo 19 Dec 2023

Large language models (LLMs) have shown great success in various Natural Language Processing (NLP) tasks, whist they still need updates after deployment to fix errors or keep pace with the changing knowledge in the world.

4
19 Dec 2023

Summarization-based Data Augmentation for Document Classification

etsurin/summaug 1 Dec 2023

Despite the prevalence of pretrained language models in natural language understanding tasks, understanding lengthy text such as document is still challenging due to the data sparseness problem.

0
01 Dec 2023

SUT: a new multi-purpose synthetic dataset for Farsi document image analysis

aliiafkari/SUT_Dataset 13th International Conference on Computer and Knowledge Engineering (ICCKE) 2023

This paper introduces a new large-scale dataset for Farsi document images, named SUT, which aims to tackle the challenges associated with obtaining diverse and substantial ground-truth data for supervised models in document image analysis (DIA) tasks, such as document image classification, text detection and recognition, and information retrieval.

4
27 Nov 2023

ContraDoc: Understanding Self-Contradictions in Documents with Large Language Models

ddhruvkr/contradoc 15 Nov 2023

In recent times, large language models (LLMs) have shown impressive performance on various document-level tasks such as document classification, summarization, and question-answering.

7
15 Nov 2023

Optimal Transport for Measures with Noisy Tree Metric

lttam/robustot-noisytreemetric 20 Oct 2023

It is known that such OT problem (i. e., tree-Wasserstein (TW)) admits a closed-form expression, but depends fundamentally on the underlying tree structure over supports of input measures.

2
20 Oct 2023