Key-value Pair Extraction
9 papers with code • 2 benchmarks • 3 datasets
Extract key-value pairs from a form-like document.
A prediction is considered TP if the predicted key and value contents match the ground truths. Key-value pair F1-score is employed as the metric.
Libraries
Use these libraries to find Key-value Pair Extraction models and implementationsMost implemented papers
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding.
OCR-free Document Understanding Transformer
Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding
LiLT can be pre-trained on the structured documents of a single language and then directly fine-tuned on other languages with the corresponding off-the-shelf monolingual/multilingual pre-trained textual models.
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking.
Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction
However, BIO-tagging scheme relies on the correct order of model inputs, which is not guaranteed in real-world NER on scanned VrDs where text are recognized and arranged by OCR systems.
GeoLayoutLM: Geometric Pre-training for Visual Information Extraction
Additionally, novel relation heads, which are pre-trained by the geometric pre-training tasks and fine-tuned for RE, are elaborately designed to enrich and enhance the feature representation.
PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction
However, simply concatenating SER and RE serially can lead to severe error propagation, and it fails to handle cases like multi-line entities in real scenarios.
KVP10k : A Comprehensive Dataset for Key-Value Pair Extraction in Business Documents
In recent years, the challenge of extracting information from business documents has emerged as a critical task, finding applications across numerous domains.