Scene Text Recognition

121 papers with code • 15 benchmarks • 27 datasets

See Scene Text Detection for leaderboards in this task.

Libraries

Use these libraries to find Scene Text Recognition models and implementations

Revisiting Scene Text Recognition: A Data Perspective

idea-research/t-rex ICCV 2023

To this end, we consolidate a large-scale real STR dataset, namely Union14M, which comprises 4 million labeled images and 10 million unlabeled images, to assess the performance of STR models in more complex real-world scenarios.

1,862
17 Jul 2023

Looking and Listening: Audio Guided Text Recognition

wenwenyu/audioocr 6 Jun 2023

Text recognition in the wild is a long-standing problem in computer vision.

10
06 Jun 2023

MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition

simplify23/MRN ICCV 2023

In this paper, we propose the Incremental MLTR (IMLTR) task in the context of incremental learning (IL), where different languages are introduced in batches.

46
24 May 2023

CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model

VamosC/CLIP4STR 23 May 2023

With such merits, we transform CLIP into a scene text reader and introduce CLIP4STR, a simple yet effective STR method built upon image and text encoders of CLIP.

62
23 May 2023

TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition

simplify23/tps_pp 9 May 2023

In this work, we introduce TPS++, an attention-enhanced TPS transformation that incorporates the attention mechanism to text rectification for the first time.

36
09 May 2023

Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition

CyrilSterling/LPV 9 May 2023

Vision model have gained increasing attention due to their simplicity and efficiency in Scene Text Recognition (STR) task.

20
09 May 2023

Geometric Perception based Efficient Text Recognition

ACRA-FL/GeoTRNet 8 Feb 2023

Every Scene Text Recognition (STR) task consists of text localization \& text recognition as the prominent sub-tasks.

1
08 Feb 2023

B-Spline Texture Coefficients Estimator for Screen Content Image Super-Resolution

byeonghyunpak/btc CVPR 2023

Our network outperforms both a transformer-based reconstruction and an implicit Fourier representation method in almost upscaling factor, thanks to the positive constraint and compact support of the B-spline basis.

22
01 Jan 2023

ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting

fangshancheng/abinet-pp 19 Nov 2022

In this paper, we argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input.

76
19 Nov 2022

Masked Vision-Language Transformers for Scene Text Recognition

onealwj/mvlt 9 Nov 2022

MVLT is trained in two stages: in the first stage, we design a STR-tailored pretraining method based on a masking strategy; in the second stage, we fine-tune our model and adopt an iterative correction method to improve the performance.

29
09 Nov 2022