Scene Text Recognition

121 papers with code • 15 benchmarks • 27 datasets

See Scene Text Detection for leaderboards in this task.

Libraries

Use these libraries to find Scene Text Recognition models and implementations

Efficient scene text image super-resolution with semantic guidance

sijieliu518/sgenet 20 Mar 2024

Scene text image super-resolution has significantly improved the accuracy of scene text recognition.

8
20 Mar 2024

Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition

melosy/cam 21 Feb 2024

By enhancing the alignment between the canonical mask feature and the text feature, the module ensures more effective fusion, ultimately leading to improved recognition performance.

15
21 Feb 2024

Text Image Inpainting via Global Structure-Guided Diffusion Models

blackprotoss/gsdm 26 Jan 2024

Leveraging the global structure of the text as a prior, the proposed GSDM develops an efficient diffusion model to recover clean texts.

28
26 Jan 2024

VIPTR: A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition

cxfyxl/viptr 18 Jan 2024

In this work, we propose the VIsion Permutable extractor for fast and efficient scene Text Recognition (VIPTR), which achieves an impressive balance between high performance and rapid inference speeds in the domain of STR.

25
18 Jan 2024

An Empirical Study of Scaling Law for OCR

large-ocr-model/large-ocr-model.github.io 29 Dec 2023

The laws of model size, data volume, computation and model performance have been extensively studied in the field of Natural Language Processing (NLP).

106
29 Dec 2023

Cross-Lingual Learning in Multilingual Scene Text Recognition

ku21fan/cll-str 17 Dec 2023

We aim to find the condition that exploits knowledge from high-resource languages for improving performance in low-resource languages.

11
17 Dec 2023

Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer

bytedance/e2str 22 Nov 2023

A straightforward solution is performing model fine-tuning tailored to a specific scenario, but it is computationally intensive and requires multiple model copies for various scenarios.

25
22 Nov 2023

Scene Text Image Super-resolution based on Text-conditional Diffusion Models

toyotainfotech/stisr-tcdm 16 Nov 2023

Utilizing this capability, we propose a novel framework for synthesizing LR-HR paired text image datasets.

8
16 Nov 2023

Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth Evaluation

scut-dlvclab/gpt-4v_ocr 25 Oct 2023

We assess the model's performance across a range of OCR tasks, including scene text recognition, handwritten text recognition, handwritten mathematical expression recognition, table structure recognition, and information extraction from visually-rich document.

104
25 Oct 2023

DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond

alibabaresearch/advancedliteratemachinery 19 Oct 2023

In this report, we introduce DocXChain, a powerful open-source toolchain for document parsing, which is designed and developed to automatically convert the rich information embodied in unstructured documents, such as text, tables and charts, into structured representations that are readable and manipulable by machines.

918
19 Oct 2023