DocBERT: BERT for Document Classification

17 Apr 2019 Ashutosh Adhikari Achyudh Ram Raphael Tang Jimmy Lin

We present, to our knowledge, the first application of BERT to document classification. A few characteristics of the task might lead one to think that BERT is not the most appropriate model: syntactic structures matter less for content categories, documents can often be longer than typical BERT input, and documents often have multiple labels... (read more)

PDF Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Document Classification AAPD KD-LSTMreg F1 72.9 # 1
Text Classification IMDb KD-LSTMreg Accuracy (2 classes) - # 11
Accuracy (10 classes) 53.7 # 2
Document Classification Reuters-21578 KD-LSTMreg F1 88.9 # 3
Document Classification Yelp-14 KD-LSTMreg Accuracy 69.4 # 1

Results from Other Papers


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK SOURCE PAPER COMPARE
Clinical Note Phenotyping I2B2 2006: Smoking DocBert Adhikari et al. (2019) Micro F1 80.2 # 2
Clinical Note Phenotyping I2B2 2008: Obesity DocBert Adhikari et al. (2019) Micro F1 67.6 # 3

Methods used in the Paper


METHOD TYPE
Residual Connection
Skip Connections
Attention Dropout
Regularization
Linear Warmup With Linear Decay
Learning Rate Schedules
Weight Decay
Regularization
GELU
Activation Functions
Dense Connections
Feedforward Networks
Adam
Stochastic Optimization
WordPiece
Subword Segmentation
Softmax
Output Functions
Dropout
Regularization
Multi-Head Attention
Attention Modules
Layer Normalization
Normalization
Scaled Dot-Product Attention
Attention Mechanisms
BERT
Language Models