Scene Text Recognition
121 papers with code • 15 benchmarks • 27 datasets
See Scene Text Detection for leaderboards in this task.
Libraries
Use these libraries to find Scene Text Recognition models and implementationsLatest papers
Efficient scene text image super-resolution with semantic guidance
Scene text image super-resolution has significantly improved the accuracy of scene text recognition.
Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition
By enhancing the alignment between the canonical mask feature and the text feature, the module ensures more effective fusion, ultimately leading to improved recognition performance.
Text Image Inpainting via Global Structure-Guided Diffusion Models
Leveraging the global structure of the text as a prior, the proposed GSDM develops an efficient diffusion model to recover clean texts.
VIPTR: A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition
In this work, we propose the VIsion Permutable extractor for fast and efficient scene Text Recognition (VIPTR), which achieves an impressive balance between high performance and rapid inference speeds in the domain of STR.
An Empirical Study of Scaling Law for OCR
The laws of model size, data volume, computation and model performance have been extensively studied in the field of Natural Language Processing (NLP).
Cross-Lingual Learning in Multilingual Scene Text Recognition
We aim to find the condition that exploits knowledge from high-resource languages for improving performance in low-resource languages.
Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
A straightforward solution is performing model fine-tuning tailored to a specific scenario, but it is computationally intensive and requires multiple model copies for various scenarios.
Scene Text Image Super-resolution based on Text-conditional Diffusion Models
Utilizing this capability, we propose a novel framework for synthesizing LR-HR paired text image datasets.
Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth Evaluation
We assess the model's performance across a range of OCR tasks, including scene text recognition, handwritten text recognition, handwritten mathematical expression recognition, table structure recognition, and information extraction from visually-rich document.
DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond
In this report, we introduce DocXChain, a powerful open-source toolchain for document parsing, which is designed and developed to automatically convert the rich information embodied in unstructured documents, such as text, tables and charts, into structured representations that are readable and manipulable by machines.