Document Classification

206 papers with code • 19 benchmarks • 15 datasets

Document Classification is a procedure of assigning one or more labels to a document from a predetermined set of labels.

Source: Long-length Legal Document Classification

Benchmarks

Add a Result

These leaderboards are used to track progress in Document Classification

Dataset	Best Model	Compare
Reuters-21578	MPAD-path	See all
Cora	ACNet	See all
HOC	BioLinkBERT (large)	See all
BBCSport	MPAD-path	See all
Amazon	ApproxRepSet	See all
Twitter	ApproxRepSet	See all
WOS-5736	ConvTextTM	See all
IMDb-M	Document Classification Using Importance of Sentences	See all
AAPD	KD-LSTMreg	See all
Classic	REL-RWMD k-NN	See all
Recipe	ApproxRepSet	See all
SciDocs (MAG)	SciNCL	See all
SciDocs (MeSH)	SciNCL	See all
WOS-11967	RMDL (30 RDLs)	See all
WOS-46985	RMDL (30 RDLs)	See all
Yelp-14	KD-LSTMreg	See all
Reuters En-De	BilBOWA	See all
Reuters De-En	BilBOWA	See all
MPQA	MPAD-path	See all

Show all 19 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Document Classification models and implementations

huggingface/transformers

2 papers

124,527

sergioburdisso/pyss3

2 papers

331

eske/multivec

2 papers

116

IllinoisGraphBenchmark/IGB-Datasets

2 papers

See all 6 libraries.

Datasets

Subtasks

Page Stream Segmentation

Latest papers with no code

Most implemented Social Latest No code

BuDDIE: A Business Document Dataset for Multi-task Information Extraction

no code yet • 5 Apr 2024

Several datasets exist for research on specific tasks of VRDU such as document classification (DC), key entity extraction (KEE), entity linking, visual question answering (VQA), inter alia.

Paper
Add Code

Developing Healthcare Language Model Embedding Spaces

no code yet • 28 Mar 2024

Pre-trained Large Language Models (LLMs) often struggle on out-of-domain datasets like healthcare focused text.

Paper
Add Code

Clustering Document Parts: Detecting and Characterizing Influence Campaigns From Documents

no code yet • 27 Feb 2024

We propose a novel clustering pipeline to detect and characterize influence campaigns from documents.

Paper
Add Code

NLP for Knowledge Discovery and Information Extraction from Energetics Corpora

no code yet • 10 Feb 2024

Furthermore, we present a document classification pipeline for energetics text.

Paper
Add Code

Efficient Models for the Detection of Hate, Abuse and Profanity

no code yet • 8 Feb 2024

This is unacceptable in civil discourse. The detection of Hate, Abuse and Profanity in text is a vital component of creating civil and unbiased LLMs, which is needed not only for English, but for all languages.

Paper
Add Code

Generalized Sobolev Transport for Probability Measures on a Graph

no code yet • 7 Feb 2024

In connection with the OW, we show that one only needs to simply solve a univariate optimization problem to compute the GST, unlike the complex two-level optimization problem in OW.

Paper
Add Code

L3Cube-IndicNews: News-based Short Text and Long Document Classification Datasets in Indic Languages

no code yet • 4 Jan 2024

This research contributes significantly to expanding the pool of available text classification datasets and also makes it possible to develop topic classification models for Indian regional languages.

Paper
Add Code

A Learning oriented DLP System based on Classification Model

no code yet • 21 Dec 2023

Data is the key asset for organizations and data sharing is lifeline for organization growth; which may lead to data loss.

Paper
Add Code

Diversifying Knowledge Enhancement of Biomedical Language Models using Adapter Modules and Knowledge Graphs

no code yet • 21 Dec 2023

In this paper, we develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models (PLMs).

Paper
Add Code

Large language models in healthcare and medical domain: A review

no code yet • 12 Dec 2023

The deployment of large language models (LLMs) within the healthcare sector has sparked both enthusiasm and apprehension.

Paper
Add Code

Document Classification

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result