Scene Text Recognition

121 papers with code • 15 benchmarks • 27 datasets

See Scene Text Detection for leaderboards in this task.

Benchmarks

Add a Result

These leaderboards are used to track progress in Scene Text Recognition

Dataset	Best Model	Compare
ICDAR2013	CLIP4STR-L*	See all
SVT	DTrOCR	See all
ICDAR2015	DTrOCR	See all
CUTE80	CPPD	See all
SVTP	DTrOCR	See all
IIIT5k	DTrOCR	See all
ICDAR 2003	DTrOCR	See all
COCO-Text	CLIP4STR-L	See all
IC19-Art	CLIP4STR-L	See all
HOST	CLIP4STR-L	See all
WOST	CLIP4STR-L	See all
MSDA	MetaSelf-Learning	See all
Uber-Text	MGP-STR	See all
SVT-P	ABINet-LV+TPS++	See all
IC13	ABINet-LV+TPS++	See all

Show all 15 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Scene Text Recognition models and implementations

PaddlePaddle/PaddleOCR

14 papers

38,330

mindspore-lab/mindocr

7 papers

155

Media-Smart/vedastr

6 papers

531

alibabaresearch/advancedliteratemac…

5 papers

894

See all 9 libraries.

Datasets

Most implemented papers

Most implemented Social Latest No code

From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network

wangyuxin87/VisionLAN • • ICCV 2021

Such operation guides the vision model to use not only the visual texture of characters, but also the linguistic information in visual context for recognition when the visual cues are confused (e. g. occlusion, noise, etc.).

Paper
Code

STN-OCR: A single Neural Network for Text Detection and Text Recognition

Bartzi/stn-ocr • • 27 Jul 2017

In contrast to most existing works that consist of multiple deep neural networks and several pre-processing steps we propose to use a single deep neural network that learns to detect and recognize text from natural images in a semi-supervised way.

Paper
Code

TextBoxes++: A Single-Shot Oriented Scene Text Detector

MhLiao/TextBoxes_plusplus • 9 Jan 2018

In this paper, we present an end-to-end trainable fast scene text detector, named TextBoxes++, which detects arbitrary-oriented scene text with both high accuracy and efficiency in a single network forward pass.

Paper
Code

NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition

PaddlePaddle/PaddleOCR • • 4 Jun 2018

Considering scene image has large variation in text and background, we further design a modality-transform block to effectively transform 2D input images to 1D sequences, combined with the encoder to extract more discriminative features.

Paper
Code

ASTER: An Attentional Scene Text Recognizer with Flexible Rectification

bgshih/aster • • good 2018

SCENE text recognition has attracted great interest from the academia and the industry in recent years owing to its importance in a wide range of applications.

Paper
Code

Visual Re-ranking with Natural Language Understanding for Text Spotting

ahmedssabir/Visual-Semantic-Relatedness-with-Word-Embedding • • 29 Oct 2018

We propose a post-processing approach to improve scene text recognition accuracy by using occurrence probabilities of words (unigram language model), and the semantic correlation between scene and text.

Paper
Code

UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World

Jyouhou/UnrealText • CVPR 2020

Synthetic data has been a critical tool for training scene text detection and recognition models.

Paper
Code

SPIN: Structure-Preserving Inner Offset Network for Scene Text Recognition

hikopensource/davar-lab-ocr • • 27 May 2020

Arbitrary text appearance poses a great challenge in scene text recognition tasks.

Paper
Code

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

FangShancheng/ABINet • • CVPR 2021

Additionally, based on the ensemble of iterative predictions, we propose a self-training method which can learn from unlabeled images effectively.

Paper
Code

Vision Transformer for Fast and Efficient Scene Text Recognition

roatienza/deep-text-recognition-benchmark • • 18 May 2021

On a comparable strong baseline method such as TRBA with accuracy of 84. 3%, our small ViTSTR achieves a competitive accuracy of 82. 6% (84. 2% with data augmentation) at 2. 4x speed up, using only 43. 4% of the number of parameters and 42. 2% FLOPS.

Paper
Code

Scene Text Recognition

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result