RVL-CDIP_MP-N can serve its original goal as a covariate shift test set, now for multi-page document classification. We were able to retrieve the original full documents from DocumentCloud and Web Search.
It has the same label taxonomy as RVL-CDIP (16) with close to 1K documents in PDF format, averaging 10 pages per document.
Paper | Code | Results | Date | Stars |
---|