Document Classification
206 papers with code • 19 benchmarks • 15 datasets
Document Classification is a procedure of assigning one or more labels to a document from a predetermined set of labels.
Libraries
Use these libraries to find Document Classification models and implementationsDatasets
Latest papers with no code
BuDDIE: A Business Document Dataset for Multi-task Information Extraction
Several datasets exist for research on specific tasks of VRDU such as document classification (DC), key entity extraction (KEE), entity linking, visual question answering (VQA), inter alia.
Developing Healthcare Language Model Embedding Spaces
Pre-trained Large Language Models (LLMs) often struggle on out-of-domain datasets like healthcare focused text.
Clustering Document Parts: Detecting and Characterizing Influence Campaigns From Documents
We propose a novel clustering pipeline to detect and characterize influence campaigns from documents.
NLP for Knowledge Discovery and Information Extraction from Energetics Corpora
Furthermore, we present a document classification pipeline for energetics text.
Efficient Models for the Detection of Hate, Abuse and Profanity
This is unacceptable in civil discourse. The detection of Hate, Abuse and Profanity in text is a vital component of creating civil and unbiased LLMs, which is needed not only for English, but for all languages.
Generalized Sobolev Transport for Probability Measures on a Graph
In connection with the OW, we show that one only needs to simply solve a univariate optimization problem to compute the GST, unlike the complex two-level optimization problem in OW.
L3Cube-IndicNews: News-based Short Text and Long Document Classification Datasets in Indic Languages
This research contributes significantly to expanding the pool of available text classification datasets and also makes it possible to develop topic classification models for Indian regional languages.
A Learning oriented DLP System based on Classification Model
Data is the key asset for organizations and data sharing is lifeline for organization growth; which may lead to data loss.
Diversifying Knowledge Enhancement of Biomedical Language Models using Adapter Modules and Knowledge Graphs
In this paper, we develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models (PLMs).
Large language models in healthcare and medical domain: A review
The deployment of large language models (LLMs) within the healthcare sector has sparked both enthusiasm and apprehension.