Scene Text Recognition
121 papers with code • 15 benchmarks • 27 datasets
See Scene Text Detection for leaderboards in this task.
Libraries
Use these libraries to find Scene Text Recognition models and implementationsLatest papers
Revisiting Scene Text Recognition: A Data Perspective
To this end, we consolidate a large-scale real STR dataset, namely Union14M, which comprises 4 million labeled images and 10 million unlabeled images, to assess the performance of STR models in more complex real-world scenarios.
Looking and Listening: Audio Guided Text Recognition
Text recognition in the wild is a long-standing problem in computer vision.
MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition
In this paper, we propose the Incremental MLTR (IMLTR) task in the context of incremental learning (IL), where different languages are introduced in batches.
CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model
With such merits, we transform CLIP into a scene text reader and introduce CLIP4STR, a simple yet effective STR method built upon image and text encoders of CLIP.
TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition
In this work, we introduce TPS++, an attention-enhanced TPS transformation that incorporates the attention mechanism to text rectification for the first time.
Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition
Vision model have gained increasing attention due to their simplicity and efficiency in Scene Text Recognition (STR) task.
Geometric Perception based Efficient Text Recognition
Every Scene Text Recognition (STR) task consists of text localization \& text recognition as the prominent sub-tasks.
B-Spline Texture Coefficients Estimator for Screen Content Image Super-Resolution
Our network outperforms both a transformer-based reconstruction and an implicit Fourier representation method in almost upscaling factor, thanks to the positive constraint and compact support of the B-spline basis.
ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting
In this paper, we argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input.
Masked Vision-Language Transformers for Scene Text Recognition
MVLT is trained in two stages: in the first stage, we design a STR-tailored pretraining method based on a masking strategy; in the second stage, we fine-tune our model and adopt an iterative correction method to improve the performance.