Modular Multimodal Architecture for Document Classification

Page classification is a crucial component to any document analysis system, allowing for complex branching control flows for different components of a given document. Utilizing both the visual and textual content of a page, the proposed method exceeds the current state-of-the-art performance on the RVL-CDIP benchmark at 93.03% test accuracy...

PDF Abstract

Datasets


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Multi-Modal Document Classification RVL-CDIP VGG16 + BoW-300K Accuracy 93.03 # 1

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet