Scene Text Recognition
121 papers with code • 15 benchmarks • 27 datasets
See Scene Text Detection for leaderboards in this task.
Libraries
Use these libraries to find Scene Text Recognition models and implementationsLatest papers with no code
Recognition-Guided Diffusion Model for Scene Text Image Super-Resolution
Scene Text Image Super-Resolution (STISR) aims to enhance the resolution and legibility of text within low-resolution (LR) images, consequently elevating recognition accuracy in Scene Text Recognition (STR).
Reading Between the Lanes: Text VideoQA on the Road
Text and signs around roads provide crucial information for drivers, vital for safe navigation and situational awareness.
DiffusionSTR: Diffusion Model for Scene Text Recognition
This paper presents Diffusion Model for Scene Text Recognition (DiffusionSTR), an end-to-end text recognition framework using diffusion models for recognizing text in the wild.
Weakly Supervised Scene Text Generation for Low-resource Languages
A large number of annotated training images is crucial for training successful scene text recognition models.
Masked and Permuted Implicit Context Learning for Scene Text Recognition
We utilize the training procedure of PLM, and to integrate MLM, we incorporate word length information into the decoding process and replace the undetermined characters with mask tokens.
Scene Text Recognition with Image-Text Matching-guided Dictionary
Inspired by ITC, the SITM network combines the visual features and the text features of all candidates to identify the candidate with the minimum distance in the feature space.
Improving Scene Text Recognition for Character-Level Long-Tailed Distribution
However, STR models show a large performance degradation on languages with a numerous number of characters (e. g., Chinese and Korean), especially on characters that rarely appear due to the long-tailed distribution of characters in such languages.
Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model
Despite the success of deep neural network (DNN) on sequential data (i. e., scene text and speech) recognition, it suffers from the over-confidence problem mainly due to overfitting in training with the cross-entropy loss, which may make the decision-making less reliable.
Diffusion in the Dark: A Diffusion Model for Low-Light Text Recognition
Capturing images is a key part of automation for high-level tasks such as scene text recognition.
Augmented Transformers with Adaptive n-grams Embedding for Multilingual Scene Text Recognition
While vision transformers have been highly successful in improving the performance in image-based tasks, not much work has been reported on applying transformers to multilingual scene text recognition due to the complexities in the visual appearance of multilingual texts.