Document Classification

Document Classification is a procedure of assigning one or more labels to a document from a predetermined set of labels.

Source: Long-length Legal Document Classification

Improving Language Understanding by Generative Pre-Training

huggingface/transformers Preprint 2018

We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task.

Document Classification Language Modelling

Pentagon at MEDIQA 2019: Multi-task Learning for Filtering and Re-ranking Answers using Language Inference and Question Entailment

google-research/bert WS 2019

Parallel deep learning architectures like fine-tuned BERT and MT-DNN, have quickly become the state of the art, bypassing previous deep and shallow learning methods by a large margin.

Document Classification Multi-Task Learning

Semi-Supervised Classification with Graph Convolutional Networks

tkipf/gcn 9 Sep 2016

We present a scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs.

Document Classification General Classification

Pre-Training with Whole Word Masking for Chinese BERT

ymcui/Chinese-BERT-wwm 19 Jun 2019

In this technical report, we adapt whole word masking in Chinese text, that masking the whole word instead of masking Chinese characters, which could bring another challenge in Masked Language Model (MLM) pre-training task.

Document Classification Document-level

Graph Attention Networks

labmlai/annotated_deep_learning_paper_implementations ICLR 2018

We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations.

Document Classification Graph Attention

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond

facebookresearch/LASER TACL 2019

We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts.

Cross-Lingual Bitext Mining Cross-Lingual Document Classification

Modular Multimodal Architecture for Document Classification

microsoft/unilm 9 Dec 2019

Page classification is a crucial component to any document analysis system, allowing for complex branching control flows for different components of a given document.

Document Classification General Classification

Robust Cross-lingual Embeddings from Parallel Sentences

epfml/sent2vec 28 Dec 2019

Recent advances in cross-lingual word embeddings have primarily relied on mapping-based methods, which project pretrained word embeddings from different languages into a shared space through a linear transformation.

Cross-Lingual Document Classification Document Classification