Scene Text Recognition

121 papers with code • 15 benchmarks • 27 datasets

See Scene Text Detection for leaderboards in this task.

Benchmarks

Add a Result

These leaderboards are used to track progress in Scene Text Recognition

Dataset	Best Model	Compare
ICDAR2013	CLIP4STR-L*	See all
SVT	DTrOCR	See all
ICDAR2015	DTrOCR	See all
CUTE80	CPPD	See all
SVTP	DTrOCR	See all
IIIT5k	DTrOCR	See all
ICDAR 2003	DTrOCR	See all
COCO-Text	CLIP4STR-L	See all
IC19-Art	CLIP4STR-L	See all
HOST	CLIP4STR-L	See all
WOST	CLIP4STR-L	See all
MSDA	MetaSelf-Learning	See all
Uber-Text	MGP-STR	See all
SVT-P	ABINet-LV+TPS++	See all
IC13	ABINet-LV+TPS++	See all

Show all 15 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Scene Text Recognition models and implementations

PaddlePaddle/PaddleOCR

14 papers

38,418

mindspore-lab/mindocr

7 papers

157

Media-Smart/vedastr

6 papers

531

alibabaresearch/advancedliteratemac…

5 papers

918

See all 9 libraries.

Datasets

Latest papers

Most implemented Social Latest No code

Efficient scene text image super-resolution with semantic guidance

sijieliu518/sgenet • • 20 Mar 2024

Scene text image super-resolution has significantly improved the accuracy of scene text recognition.

20 Mar 2024

Paper
Code

Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition

melosy/cam • • 21 Feb 2024

By enhancing the alignment between the canonical mask feature and the text feature, the module ensures more effective fusion, ultimately leading to improved recognition performance.

21 Feb 2024

Paper
Code

Text Image Inpainting via Global Structure-Guided Diffusion Models

blackprotoss/gsdm • • 26 Jan 2024

Leveraging the global structure of the text as a prior, the proposed GSDM develops an efficient diffusion model to recover clean texts.

26 Jan 2024

Paper
Code

VIPTR: A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition

cxfyxl/viptr • • 18 Jan 2024

In this work, we propose the VIsion Permutable extractor for fast and efficient scene Text Recognition (VIPTR), which achieves an impressive balance between high performance and rapid inference speeds in the domain of STR.

18 Jan 2024

Paper
Code

An Empirical Study of Scaling Law for OCR

large-ocr-model/large-ocr-model.github.io • 29 Dec 2023

The laws of model size, data volume, computation and model performance have been extensively studied in the field of Natural Language Processing (NLP).

106

29 Dec 2023

Paper
Code

Cross-Lingual Learning in Multilingual Scene Text Recognition

ku21fan/cll-str • • 17 Dec 2023

We aim to find the condition that exploits knowledge from high-resource languages for improving performance in low-resource languages.

17 Dec 2023

Paper
Code

Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer

bytedance/e2str • • 22 Nov 2023

A straightforward solution is performing model fine-tuning tailored to a specific scenario, but it is computationally intensive and requires multiple model copies for various scenarios.

22 Nov 2023

Paper
Code

Scene Text Image Super-resolution based on Text-conditional Diffusion Models

toyotainfotech/stisr-tcdm • • 16 Nov 2023

Utilizing this capability, we propose a novel framework for synthesizing LR-HR paired text image datasets.

16 Nov 2023

Paper
Code

Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth Evaluation

scut-dlvclab/gpt-4v_ocr • 25 Oct 2023

We assess the model's performance across a range of OCR tasks, including scene text recognition, handwritten text recognition, handwritten mathematical expression recognition, table structure recognition, and information extraction from visually-rich document.

104

25 Oct 2023

Paper
Code

DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond

alibabaresearch/advancedliteratemachinery • • 19 Oct 2023

In this report, we introduce DocXChain, a powerful open-source toolchain for document parsing, which is designed and developed to automatically convert the rich information embodied in unstructured documents, such as text, tables and charts, into structured representations that are readable and manipulable by machines.

918

19 Oct 2023

Paper
Code

Scene Text Recognition

Benchmarks Add a Result

Libraries

Datasets

Latest papers

Content

Benchmarks

Add a Result