Modular Multimodal Architecture for Document Classification

9 Dec 2019  ·  Tyler Dauphinee, Nikunj Patel, Mohammad Rashidi ·

Page classification is a crucial component to any document analysis system, allowing for complex branching control flows for different components of a given document. Utilizing both the visual and textual content of a page, the proposed method exceeds the current state-of-the-art performance on the RVL-CDIP benchmark at 93.03% test accuracy.

PDF Abstract


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Multi-Modal Document Classification RVL-CDIP VGG16 + BoW-300K Accuracy 93.03 # 1