Scene Text Detection
91 papers with code • 9 benchmarks • 15 datasets
Scene Text Detection is a computer vision task that involves automatically identifying and localizing text within natural images or videos. The goal of scene text detection is to develop algorithms that can robustly detect and and label text with bounding boxes in uncontrolled and complex environments, such as street signs, billboards, or license plates.
Source: ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene Text Detection
Libraries
Use these libraries to find Scene Text Detection models and implementationsDatasets
Latest papers with no code
Separate Scene Text Detector for Unseen Scripts is Not All You Need
It raises a critical question: Is there a need for separate training for new scripts?
Adaptive Segmentation Network for Scene Text Detection
Besides, we design a Global-information Enhanced Feature Pyramid Network (GE-FPN) for capturing text instances with macro size and extreme aspect ratios.
CT-Net: Arbitrary-Shaped Text Detection via Contour Transformer
Contour based scene text detection methods have rapidly developed recently, but still suffer from inaccurate frontend contour initialization, multi-stage error accumulation, or deficient local information aggregation.
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision
End-to-end text spotting is a vital computer vision task that aims to integrate scene text detection and recognition into a unified framework.
Deformable Kernel Expansion Model for Efficient Arbitrary-shaped Scene Text Detection
DKE employs a segmentation module to segment the shrunken text region as the text kernel, then expands the text kernel contour to obtain text boundary by regressing the vertex-wise offsets.
Domain Adaptive Scene Text Detection via Subcategorization
We study domain adaptive scene text detection, a largely neglected yet very meaningful task that aims for optimal transfer of labelled scene text images while handling unlabelled images in various new domains.
Aggregated Text Transformer for Scene Text Detection
We present the Aggregated Text TRansformer(ATTR), which is designed to represent texts in scene images with a multi-scale self-attention mechanism.
Text Growing on Leaf
Then, lateral and thin veins are generated along the main vein growth direction in polar coordinates.
DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection
We further propose a parallel design that integrates the convolutional network with a powerful self-attention mechanism to provide complementary clues between the attention path and convolutional path.
Shift Variance in Scene Text Detection
Theory of convolutional neural networks suggests the property of shift equivariance, i. e., that a shifted input causes an equally shifted output.