Scene Text Detection

91 papers with code • 9 benchmarks • 15 datasets

Scene Text Detection is a computer vision task that involves automatically identifying and localizing text within natural images or videos. The goal of scene text detection is to develop algorithms that can robustly detect and and label text with bounding boxes in uncontrolled and complex environments, such as street signs, billboards, or license plates.

Source: ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene Text Detection

Libraries

Use these libraries to find Scene Text Detection models and implementations

DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond

alibabaresearch/advancedliteratemachinery 19 Oct 2023

In this report, we introduce DocXChain, a powerful open-source toolchain for document parsing, which is designed and developed to automatically convert the rich information embodied in unstructured documents, such as text, tables and charts, into structured representations that are readable and manipulable by machines.

894
19 Oct 2023

Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing Text Spotting Performance

alloydas/testr_eval 2 Oct 2023

The adaptation capability to a wide range of domains is crucial for scene text spotting models when deployed to real-world conditions.

1
02 Oct 2023

STEP -- Towards Structured Scene-Text Spotting

cvc-dag/step 5 Sep 2023

We introduce the structured scene-text spotting task, which requires a scene-text OCR system to spot text in the wild according to a query regular expression.

2
05 Sep 2023

MixNet: Toward Accurate Detection of Challenging Scene Text in the Wild

D641593/MixNet 23 Aug 2023

Detecting small scene text instances in the wild is particularly challenging, where the influence of irregular positions and nonideal lighting often leads to detection errors.

42
23 Aug 2023

Turning a CLIP Model into a Scene Text Spotter

wenwenyu/tcm 21 Aug 2023

Utilizing only 10% of the supervised data, FastTCM-CR50 improves performance by an average of 26. 5% and 5. 5% for text detection and spotting tasks, respectively.

146
21 Aug 2023

SRFormer: Text Detection Transformer with Incorporated Segmentation and Regression

opendrivelab/elm 21 Aug 2023

In light of this, we constrain the incorporation of segmentation branches to the first few decoder layers and employ progressive regression refinement in subsequent layers, achieving performance gains while minimizing computational load from the mask. Furthermore, we propose a Mask-informed Query Enhancement module.

87
21 Aug 2023

LRANet: Towards Accurate and Efficient Scene Text Detection with Low-Rank Approximation Network

ychensu/lranet 27 Jun 2023

Next, we propose a dual assignment scheme for speed acceleration.

13
27 Jun 2023

ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining

shannanyinxiang/viteraser 21 Jun 2023

As ViTEraser implicitly integrates text localization and inpainting, we propose a novel end-to-end pretraining method, termed SegMIM, which focuses the encoder and decoder on the text box segmentation and masked image modeling tasks, respectively.

17
21 Jun 2023

DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting

vitae-transformer/deepsolo 31 May 2023

In this paper, we present DeepSolo++, a simple DETR-like baseline that lets a single decoder with explicit points solo for text detection, recognition, and script identification simultaneously.

222
31 May 2023

Turning a CLIP Model into a Scene Text Detector

wenwenyu/tcm CVPR 2023

Recently, pretraining approaches based on vision language models have made effective progresses in the field of text detection.

146
28 Feb 2023