Document Classification

207 papers with code • 19 benchmarks • 15 datasets

Document Classification is a procedure of assigning one or more labels to a document from a predetermined set of labels.

Source: Long-length Legal Document Classification

Benchmarks

Add a Result

These leaderboards are used to track progress in Document Classification

Dataset	Best Model	Compare
Reuters-21578	MPAD-path	See all
Cora	ACNet	See all
HOC	BioLinkBERT (large)	See all
BBCSport	MPAD-path	See all
Amazon	ApproxRepSet	See all
Twitter	ApproxRepSet	See all
WOS-5736	ConvTextTM	See all
IMDb-M	Document Classification Using Importance of Sentences	See all
AAPD	KD-LSTMreg	See all
Classic	REL-RWMD k-NN	See all
Recipe	ApproxRepSet	See all
SciDocs (MAG)	SciNCL	See all
SciDocs (MeSH)	SciNCL	See all
WOS-11967	RMDL (30 RDLs)	See all
WOS-46985	RMDL (30 RDLs)	See all
Yelp-14	KD-LSTMreg	See all
Reuters En-De	BilBOWA	See all
Reuters De-En	BilBOWA	See all
MPQA	MPAD-path	See all

Show all 19 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Document Classification models and implementations

huggingface/transformers

2 papers

125,059

sergioburdisso/pyss3

2 papers

331

eske/multivec

2 papers

116

IllinoisGraphBenchmark/IGB-Datasets

2 papers

See all 6 libraries.

Datasets

Subtasks

Page Stream Segmentation

Most implemented papers

Most implemented Social Latest No code

Graph Attention Networks

PetarV-/GAT • • ICLR 2018

We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations.

Paper
Code

Semi-Supervised Classification with Graph Convolutional Networks

tkipf/pygcn • • 9 Sep 2016

We present a scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs.

Paper
Code

Revisiting Semi-Supervised Learning with Graph Embeddings

tkipf/gcn • • 29 Mar 2016

We present a semi-supervised learning framework based on graph embeddings.

Paper
Code

On Calibration of Modern Neural Networks

gpleiss/temperature_scaling • • ICML 2017

Confidence calibration -- the problem of predicting probability estimates representative of the true correctness likelihood -- is important for classification models in many applications.

Paper
Code

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond

facebookresearch/LASER • • TACL 2019

We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts.

Paper
Code

Improving Language Understanding by Generative Pre-Training

huggingface/transformers • • Preprint 2018

We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task.

Paper
Code

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

hazyresearch/flash-attention • • 27 May 2022

We also extend FlashAttention to block-sparse attention, yielding an approximate attention algorithm that is faster than any existing approximate attention method.

Paper
Code

ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations

sinovation/ZEN • • Findings of the Association for Computational Linguistics 2020

Moreover, it is shown that reasonable performance can be obtained when ZEN is trained on a small corpus, which is important for applying pre-training techniques to scenarios with limited data.

Paper
Code

SPECTER: Document-level Representation Learning using Citation-informed Transformers

allenai/specter • • ACL 2020

We propose SPECTER, a new method to generate document-level embedding of scientific documents based on pretraining a Transformer language model on a powerful signal of document-level relatedness: the citation graph.

Paper
Code

GloVe: Global Vectors for Word Representation

stanfordnlp/GloVe • EMNLP 2014

Paper
Code

Document Classification

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result