FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation

3 Nov 2021  ·  Zhe Chen, Jiahao Wang, Wenhai Wang, Guo Chen, Enze Xie, Ping Luo, Tong Lu ·

We propose an accurate and efficient scene text detection framework, termed FAST (i.e., faster arbitrarily-shaped text detector). Different from recent advanced text detectors that used complicated post-processing and hand-crafted network architectures, resulting in low inference speed, FAST has two new designs. (1) We design a minimalist kernel representation (only has 1-channel output) to model text with arbitrary shape, as well as a GPU-parallel post-processing to efficiently assemble text lines with a negligible time overhead. (2) We search the network architecture tailored for text detection, leading to more powerful features than most networks that are searched for image classification. Benefiting from these two designs, FAST achieves an excellent trade-off between accuracy and efficiency on several challenging datasets, including Total Text, CTW1500, ICDAR 2015, and MSRA-TD500. For example, FAST-T yields 81.6% F-measure at 152 FPS on Total-Text, outperforming the previous fastest method by 1.7 points and 70 FPS in terms of accuracy and speed. With TensorRT optimization, the inference speed can be further accelerated to over 600 FPS. Code and models will be released at https://github.com/czczup/FAST.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Scene Text Detection ICDAR 2015 FAST-B-1280 F-Measure 87.1 # 16
Precision 89.7 # 18
Recall 84.6 # 16
FPS 15.7 # 6
Scene Text Detection ICDAR 2015 FAST-B-896 F-Measure 86.3 # 20
Precision 89.2 # 20
Recall 83.6 # 20
FPS 31.8 # 5
Scene Text Detection ICDAR 2015 FAST-B-736 F-Measure 84.7 # 24
Precision 88.0 # 26
Recall 81.7 # 25
FPS 42.7 # 4
Scene Text Detection ICDAR 2015 FAST-S-736 F-Measure 82.9 # 31
Precision 86.3 # 30
Recall 79.8 # 31
FPS 53.9 # 2
Scene Text Detection ICDAR 2015 FAST-T-736 F-Measure 81.7 # 35
Precision 86 # 31
Recall 77.9 # 35
FPS 60.9 # 1
Scene Text Detection MSRA-TD500 FAST-B-736 Recall 83 # 4
Precision 92.1 # 1
F-Measure 87.3 # 2
FPS 56.8 # 5
Scene Text Detection MSRA-TD500 FAST-T-512 Recall 78.8 # 9
Precision 91.1 # 5
F-Measure 84.5 # 8
FPS 137.2 # 1
Scene Text Detection MSRA-TD500 FAST-T-736 Recall 81.9 # 6
Precision 88.1 # 9
F-Measure 84.9 # 6
FPS 79.6 # 3
Scene Text Detection MSRA-TD500 FAST-S-736 Recall 81.7 # 7
Precision 91.6 # 2
F-Measure 86.4 # 4
FPS 72 # 4
Scene Text Detection SCUT-CTW1500 FAST-B-640 F-Measure 84.2 # 7
Precision 87.8 # 6
Recall 80.9 # 10
FPS 66.5 # 4
Scene Text Detection SCUT-CTW1500 FAST-B-512 F-Measure 82.9 # 11
Precision 85.7 # 10
Recall 80.2 # 11
FPS 92.6 # 3
Scene Text Detection SCUT-CTW1500 FAST-S-512 F-Measure 82 # 13
Precision 85.6 # 11
Recall 78.7 # 14
FPS 112.9 # 2
Scene Text Detection SCUT-CTW1500 FAST-T-512 F-Measure 81.5 # 14
Precision 85.5 # 12
Recall 77.9 # 15
FPS 129.1 # 1
Scene Text Detection Total-Text FAST-S-512 F-Measure 84.9% # 14
Precision 88.3 # 12
Recall 81.7 # 12
FPS 115.5 # 2
Scene Text Detection Total-Text FAST-T-448 F-Measure 81.6% # 20
Precision 86.5 # 16
Recall 77.2 # 19
FPS 152.8 # 1
Scene Text Detection Total-Text FAST-B-512 F-Measure 85.8% # 10
Precision 89.6 # 8
Recall 82.4 # 11
FPS 93.2 # 3
Scene Text Detection Total-Text FAST-B-640 F-Measure 86.4% # 8
Precision 89.9 # 5
Recall 83.2 # 8
FPS 67.5 # 4
Scene Text Detection Total-Text FAST-B-800 F-Measure 87.5% # 4
Precision 90.0 # 4
Recall 85.2 # 5
FPS 46 # 6

Methods