Scene Text Recognition
121 papers with code • 15 benchmarks • 27 datasets
See Scene Text Detection for leaderboards in this task.
Libraries
Use these libraries to find Scene Text Recognition models and implementationsLatest papers
Symmetrical Linguistic Feature Distillation with CLIP for Scene Text Recognition
In this paper, we explore the potential of the Contrastive Language-Image Pretraining (CLIP) model in scene text recognition (STR), and establish a novel Symmetrical Linguistic Feature Distillation framework (named CLIP-OCR) to leverage both visual and linguistic knowledge in CLIP.
Towards Large-scale Building Attribute Mapping using Crowdsourced Images: Scene Text Recognition on Flickr and Problems to be Solved
This work addresses the challenges in applying Scene Text Recognition (STR) in crowdsourced street-view images for building attribute mapping.
Orientation-Independent Chinese Text Recognition in Scene Images
We conduct experiments on a scene dataset for benchmarking Chinese text recognition, and the results demonstrate that the proposed method can indeed improve performance through disentangling content and orientation information.
Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS Aligning
However, despite Chinese characters possessing different characteristics from Latin characters, such as complex inner structures and large categories, few methods have been proposed for Chinese Text Recognition (CTR).
DTrOCR: Decoder-only Transformer for Optical Character Recognition
Typical text recognition methods rely on an encoder-decoder structure, in which the encoder extracts features from an image, and the decoder produces recognized text from these features.
LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition
The diversity in length constitutes a significant characteristic of text.
Relational Contrastive Learning for Scene Text Recognition
We argue that such prior contextual information can be interpreted as the relations of textual primitives due to the heterogeneous text and background, which can provide effective self-supervised labels for representation learning.
Multi-Granularity Prediction with Learnable Fusion for Scene Text Recognition
Specifically, MGP-STR achieves an average recognition accuracy of $94\%$ on standard benchmarks for scene text recognition.
Context Perception Parallel Decoder for Scene Text Recognition
We first present an empirical study of AR decoding in STR, and discover that the AR decoder not only models linguistic context, but also provides guidance on visual context perception.
Towards Robust Scene Text Image Super-resolution via Explicit Location Enhancement
Scene text image super-resolution (STISR), aiming to improve image quality while boosting downstream scene text recognition accuracy, has recently achieved great success.