Document Classification

206 papers with code • 19 benchmarks • 15 datasets

Document Classification is a procedure of assigning one or more labels to a document from a predetermined set of labels.

Source: Long-length Legal Document Classification

Libraries

Use these libraries to find Document Classification models and implementations

Latest papers with no code

BuDDIE: A Business Document Dataset for Multi-task Information Extraction

no code yet • 5 Apr 2024

Several datasets exist for research on specific tasks of VRDU such as document classification (DC), key entity extraction (KEE), entity linking, visual question answering (VQA), inter alia.

Developing Healthcare Language Model Embedding Spaces

no code yet • 28 Mar 2024

Pre-trained Large Language Models (LLMs) often struggle on out-of-domain datasets like healthcare focused text.

Clustering Document Parts: Detecting and Characterizing Influence Campaigns From Documents

no code yet • 27 Feb 2024

We propose a novel clustering pipeline to detect and characterize influence campaigns from documents.

NLP for Knowledge Discovery and Information Extraction from Energetics Corpora

no code yet • 10 Feb 2024

Furthermore, we present a document classification pipeline for energetics text.

Efficient Models for the Detection of Hate, Abuse and Profanity

no code yet • 8 Feb 2024

This is unacceptable in civil discourse. The detection of Hate, Abuse and Profanity in text is a vital component of creating civil and unbiased LLMs, which is needed not only for English, but for all languages.

Generalized Sobolev Transport for Probability Measures on a Graph

no code yet • 7 Feb 2024

In connection with the OW, we show that one only needs to simply solve a univariate optimization problem to compute the GST, unlike the complex two-level optimization problem in OW.

L3Cube-IndicNews: News-based Short Text and Long Document Classification Datasets in Indic Languages

no code yet • 4 Jan 2024

This research contributes significantly to expanding the pool of available text classification datasets and also makes it possible to develop topic classification models for Indian regional languages.

A Learning oriented DLP System based on Classification Model

no code yet • 21 Dec 2023

Data is the key asset for organizations and data sharing is lifeline for organization growth; which may lead to data loss.

Diversifying Knowledge Enhancement of Biomedical Language Models using Adapter Modules and Knowledge Graphs

no code yet • 21 Dec 2023

In this paper, we develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models (PLMs).

Large language models in healthcare and medical domain: A review

no code yet • 12 Dec 2023

The deployment of large language models (LLMs) within the healthcare sector has sparked both enthusiasm and apprehension.