Key Information Extraction
38 papers with code • 6 benchmarks • 11 datasets
Key Information Extraction (KIE) is aimed at extracting structured information (e.g. key-value pairs) from form-style documents (e.g. invoices), which makes an important step towards intelligent document understanding.
Libraries
Use these libraries to find Key Information Extraction models and implementationsDatasets
Most implemented papers
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
In this paper, we propose the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents.
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding
LiLT can be pre-trained on the structured documents of a single language and then directly fine-tuned on other languages with the corresponding off-the-shelf monolingual/multilingual pre-trained textual models.
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking.
PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks
Computer vision with state-of-the-art deep learning models has achieved huge success in the field of Optical Character Recognition (OCR) including text detection and recognition tasks recently.
Spatial Dual-Modality Graph Reasoning for Key Information Extraction
In order to roundly evaluate our proposed method as well as boost the future research, we release a new dataset named WildReceipt, which is collected and annotated tailored for the evaluation of key information extraction from document images of unseen templates in the wild.
Automatic Metadata Extraction Incorporating Visual Features from Scanned Electronic Theses and Dissertations
Our experiments show that CRF with visual features outperformed both a heuristic and a CRF model with only text-based features.
BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents
On the other hand, this paper tackles the problem by going back to the basic: effective combination of text and layout.
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding
Recent years have witnessed the rise and success of pre-training techniques in visually-rich document understanding.
Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction
However, BIO-tagging scheme relies on the correct order of model inputs, which is not guaranteed in real-world NER on scanned VrDs where text are recognized and arranged by OCR systems.