TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Text Spotting	Inverse-Text	DeepSolo (ViTAEv2-S, TextOCR)	F-measure (%) - No Lexicon	68.8	# 1
Text Spotting	Inverse-Text	DeepSolo (ViTAEv2-S, TextOCR)	F-measure (%) - Full Lexicon	75.8	# 1
Text Spotting	Inverse-Text	DeepSolo (ResNet-50, TextOCR)	F-measure (%) - No Lexicon	64.6	# 2
Text Spotting	Inverse-Text	DeepSolo (ResNet-50, TextOCR)	F-measure (%) - Full Lexicon	71.2	# 2
Text Spotting	Inverse-Text	DeepSolo (ResNet-50)	F-measure (%) - No Lexicon	48.5	# 4
Text Spotting	Inverse-Text	DeepSolo (ResNet-50)	F-measure (%) - Full Lexicon	53.9	# 4
Text Spotting	SCUT-CTW1500	DeepSolo (ResNet-50)	F-measure (%) - No Lexicon	64.2	# 2
Text Spotting	SCUT-CTW1500	DeepSolo (ResNet-50)	F-Measure (%) - Full Lexicon	81.4	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deepsolo-let-transformer-decoder-with-1/text-spotting-on-inverse-text)](https://paperswithcode.com/sota/text-spotting-on-inverse-text?p=deepsolo-let-transformer-decoder-with-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deepsolo-let-transformer-decoder-with-1/text-spotting-on-scut-ctw1500)](https://paperswithcode.com/sota/text-spotting-on-scut-ctw1500?p=deepsolo-let-transformer-decoder-with-1)`

DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting

31 May 2023 · Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, DaCheng Tao ·

End-to-end text spotting aims to integrate scene text detection and recognition into a unified framework. Dealing with the relationship between the two sub-tasks plays a pivotal role in designing effective spotters. Although Transformer-based methods eliminate the heuristic post-processing, they still suffer from the synergy issue between the sub-tasks and low training efficiency. Besides, they overlook the exploring on multilingual text spotting which requires an extra script identification task. In this paper, we present DeepSolo++, a simple DETR-like baseline that lets a single decoder with explicit points solo for text detection, recognition, and script identification simultaneously. Technically, for each text instance, we represent the character sequence as ordered points and model them with learnable explicit point queries. After passing a single decoder, the point queries have encoded requisite text semantics and locations, thus can be further decoded to the center line, boundary, script, and confidence of text via very simple prediction heads in parallel. Furthermore, we show the surprisingly good extensibility of our method, in terms of character class, language type, and task. On the one hand, our method not only performs well in English scenes but also masters the transcription with complex font structure and a thousand-level character classes, such as Chinese. On the other hand, our DeepSolo++ achieves better performance on the additionally introduced script identification task with a simpler training pipeline compared with previous methods. In addition, our models are also compatible with line annotations, which require much less annotation cost than polygons. The code is available at \url{https://github.com/ViTAE-Transformer/DeepSolo}.

PDF Abstract

Code

Add Remove Mark official

vitae-transformer/deepsolo official

225

vitae-transformer/vitae-transformer…

Tasks

Add Remove

Scene Text Detection

Text Detection

Text Spotting

Datasets

SCUT-CTW1500

TextOCR

Results from the Paper

Edit

Ranked #1 on Text Spotting on Inverse-Text

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Text Spotting	Inverse-Text	DeepSolo (ViTAEv2-S, TextOCR)	F-measure (%) - No Lexicon	68.8	# 1	Compare
Text Spotting	Inverse-Text	DeepSolo (ViTAEv2-S, TextOCR)	F-measure (%) - Full Lexicon	75.8	# 1	Compare
Text Spotting	Inverse-Text	DeepSolo (ResNet-50, TextOCR)	F-measure (%) - No Lexicon	64.6	# 2	Compare
Text Spotting	Inverse-Text	DeepSolo (ResNet-50, TextOCR)	F-measure (%) - Full Lexicon	71.2	# 2	Compare
Text Spotting	Inverse-Text	DeepSolo (ResNet-50)	F-measure (%) - No Lexicon	48.5	# 4	Compare
Text Spotting	Inverse-Text	DeepSolo (ResNet-50)	F-measure (%) - Full Lexicon	53.9	# 4	Compare
Text Spotting	SCUT-CTW1500	DeepSolo (ResNet-50)	F-measure (%) - No Lexicon	64.2	# 2	Compare
Text Spotting	SCUT-CTW1500	DeepSolo (ResNet-50)	F-Measure (%) - Full Lexicon	81.4	# 4	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove