Scene Text Recognition
121 papers with code • 15 benchmarks • 27 datasets
See Scene Text Detection for leaderboards in this task.
Libraries
Use these libraries to find Scene Text Recognition models and implementationsMost implemented papers
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
Such operation guides the vision model to use not only the visual texture of characters, but also the linguistic information in visual context for recognition when the visual cues are confused (e. g. occlusion, noise, etc.).
STN-OCR: A single Neural Network for Text Detection and Text Recognition
In contrast to most existing works that consist of multiple deep neural networks and several pre-processing steps we propose to use a single deep neural network that learns to detect and recognize text from natural images in a semi-supervised way.
TextBoxes++: A Single-Shot Oriented Scene Text Detector
In this paper, we present an end-to-end trainable fast scene text detector, named TextBoxes++, which detects arbitrary-oriented scene text with both high accuracy and efficiency in a single network forward pass.
NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition
Considering scene image has large variation in text and background, we further design a modality-transform block to effectively transform 2D input images to 1D sequences, combined with the encoder to extract more discriminative features.
ASTER: An Attentional Scene Text Recognizer with Flexible Rectification
SCENE text recognition has attracted great interest from the academia and the industry in recent years owing to its importance in a wide range of applications.
Visual Re-ranking with Natural Language Understanding for Text Spotting
We propose a post-processing approach to improve scene text recognition accuracy by using occurrence probabilities of words (unigram language model), and the semantic correlation between scene and text.
UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World
Synthetic data has been a critical tool for training scene text detection and recognition models.
SPIN: Structure-Preserving Inner Offset Network for Scene Text Recognition
Arbitrary text appearance poses a great challenge in scene text recognition tasks.
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
Additionally, based on the ensemble of iterative predictions, we propose a self-training method which can learn from unlabeled images effectively.
Vision Transformer for Fast and Efficient Scene Text Recognition
On a comparable strong baseline method such as TRBA with accuracy of 84. 3%, our small ViTSTR achieves a competitive accuracy of 82. 6% (84. 2% with data augmentation) at 2. 4x speed up, using only 43. 4% of the number of parameters and 42. 2% FLOPS.