Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion

21 Feb 2022  ·  Minghui Liao, Zhisheng Zou, Zhaoyi Wan, Cong Yao, Xiang Bai ·

Recently, segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field, because of their superiority in detecting the text instances of arbitrary shapes and extreme aspect ratios, profiting from the pixel-level descriptions. However, the vast majority of the existing segmentation-based approaches are limited to their complex post-processing algorithms and the scale robustness of their segmentation models, where the post-processing algorithms are not only isolated to the model optimization but also time-consuming and the scale robustness is usually strengthened by fusing multi-scale feature maps directly. In this paper, we propose a Differentiable Binarization (DB) module that integrates the binarization process, one of the most important steps in the post-processing procedure, into a segmentation network. Optimized along with the proposed DB module, the segmentation network can produce more accurate results, which enhances the accuracy of text detection with a simple pipeline. Furthermore, an efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively. By incorporating the proposed DB and ASF with the segmentation network, our proposed scene text detector consistently achieves state-of-the-art results, in terms of both detection accuracy and speed, on five standard benchmarks.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Scene Text Detection ICDAR 2015 DBNet++ (ResNet-50) (1152) F-Measure 87.3 # 13
Precision 90.9 # 11
Recall 83.9 # 19
FPS 10 # 7
Scene Text Detection ICDAR 2015 DBNet++ (ResNet-18) (736) F-Measure 83.1 # 30
Precision 90.1 # 15
Recall 77.2 # 36
FPS 44 # 3
Scene Text Detection MSRA-TD500 DBNet++ (ResNet-18) (512) Recall 76.5 # 12
Precision 89.7 # 7
F-Measure 82.6 # 11
FPS 80 # 2
Scene Text Detection MSRA-TD500 DBNet++ (ResNet-50) (736) Recall 83.3 # 3
Precision 91.5 # 3
F-Measure 87.2 # 3
FPS 29 # 7
Scene Text Detection MSRA-TD500 DBNet++ (ResNet-18) (736) Recall 82.5 # 5
Precision 87.9 # 10
F-Measure 85.1 # 5
FPS 55 # 6
Scene Text Detection Total-Text DBNet++ (ResNet-18) (800) F-Measure 83.3% # 18
Precision 87.4 # 15
Recall 79.6 # 17
FPS 48 # 5
Scene Text Detection Total-Text DBNet++ (ResNet-50) (800) F-Measure 86% # 9
Precision 88.9 # 11
Recall 83.2 # 8
FPS 28 # 7

Methods


No methods listed for this paper. Add relevant methods here