TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Text Spotting	ICDAR 2015	DeepSolo(ResNet-50)	F-measure (%) - Strong Lexicon	86.8	# 4
Text Spotting	ICDAR 2015	DeepSolo(ResNet-50)	F-measure (%) - Weak Lexicon	81.9	# 5
Text Spotting	ICDAR 2015	DeepSolo(ResNet-50)	F-measure (%) - Generic Lexicon	76.9	# 5
Text Spotting	ICDAR 2015	DeepSolo(ResNet-50, TextOCR)	F-measure (%) - Strong Lexicon	88.0	# 3
Text Spotting	ICDAR 2015	DeepSolo(ResNet-50, TextOCR)	F-measure (%) - Weak Lexicon	83.5	# 4
Text Spotting	ICDAR 2015	DeepSolo(ResNet-50, TextOCR)	F-measure (%) - Generic Lexicon	79.1	# 4
Text Spotting	ICDAR 2015	DeepSolo (ViTAEv2-S, TextOCR)	F-measure (%) - Strong Lexicon	88.1	# 2
Text Spotting	ICDAR 2015	DeepSolo (ViTAEv2-S, TextOCR)	F-measure (%) - Weak Lexicon	83.9	# 2
Text Spotting	ICDAR 2015	DeepSolo (ViTAEv2-S, TextOCR)	F-measure (%) - Generic Lexicon	79.5	# 3
Text Spotting	Total-Text	DeepSolo (ResNet-50)	F-measure (%) - Full Lexicon	87.0	# 3
Text Spotting	Total-Text	DeepSolo (ResNet-50)	F-measure (%) - No Lexicon	79.7	# 3
Text Spotting	Total-Text	DeepSolo (ViTAEv2-S, TextOCR)	F-measure (%) - Full Lexicon	89.6	# 1
Text Spotting	Total-Text	DeepSolo (ViTAEv2-S, TextOCR)	F-measure (%) - No Lexicon	83.6	# 1
Text Spotting	Total-Text	DeepSolo (ResNet-50, TextOCR)	F-measure (%) - Full Lexicon	88.7	# 2
Text Spotting	Total-Text	DeepSolo (ResNet-50, TextOCR)	F-measure (%) - No Lexicon	82.5	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deepsolo-let-transformer-decoder-with/text-spotting-on-total-text)](https://paperswithcode.com/sota/text-spotting-on-total-text?p=deepsolo-let-transformer-decoder-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deepsolo-let-transformer-decoder-with/text-spotting-on-icdar-2015)](https://paperswithcode.com/sota/text-spotting-on-icdar-2015?p=deepsolo-let-transformer-decoder-with)`

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

CVPR 2023 · Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, DaCheng Tao ·

End-to-end text spotting aims to integrate scene text detection and recognition into a unified framework. Dealing with the relationship between the two sub-tasks plays a pivotal role in designing effective spotters. Although Transformer-based methods eliminate the heuristic post-processing, they still suffer from the synergy issue between the sub-tasks and low training efficiency. In this paper, we present DeepSolo, a simple DETR-like baseline that lets a single Decoder with Explicit Points Solo for text detection and recognition simultaneously. Technically, for each text instance, we represent the character sequence as ordered points and model them with learnable explicit point queries. After passing a single decoder, the point queries have encoded requisite text semantics and locations, thus can be further decoded to the center line, boundary, script, and confidence of text via very simple prediction heads in parallel. Besides, we also introduce a text-matching criterion to deliver more accurate supervisory signals, thus enabling more efficient training. Quantitative experiments on public benchmarks demonstrate that DeepSolo outperforms previous state-of-the-art methods and achieves better training efficiency. In addition, DeepSolo is also compatible with line annotations, which require much less annotation cost than polygons. The code is available at https://github.com/ViTAE-Transformer/DeepSolo.

PDF Abstract CVPR 2023 PDF CVPR 2023 Abstract

Code

Add Remove Mark official

vitae-transformer/deepsolo official

225

vitae-transformer/vitae-transformer…

Tasks

Add Remove

Scene Text Detection

Text Detection

Text Matching

Text Spotting

Datasets

ICDAR 2013

Total-Text ICDAR 2015

Results from the Paper

Edit

Ranked #1 on Text Spotting on Total-Text (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Text Spotting	ICDAR 2015	DeepSolo(ResNet-50)	F-measure (%) - Strong Lexicon	86.8	# 4	Compare
			F-measure (%) - Weak Lexicon	81.9	# 5	Compare
			F-measure (%) - Generic Lexicon	76.9	# 5	Compare
Text Spotting	ICDAR 2015	DeepSolo(ResNet-50, TextOCR)	F-measure (%) - Strong Lexicon	88.0	# 3	Compare
			F-measure (%) - Weak Lexicon	83.5	# 4	Compare
			F-measure (%) - Generic Lexicon	79.1	# 4	Compare
Text Spotting	ICDAR 2015	DeepSolo (ViTAEv2-S, TextOCR)	F-measure (%) - Strong Lexicon	88.1	# 2	Compare
			F-measure (%) - Weak Lexicon	83.9	# 2	Compare
			F-measure (%) - Generic Lexicon	79.5	# 3	Compare
Text Spotting	Total-Text	DeepSolo (ResNet-50)	F-measure (%) - Full Lexicon	87.0	# 3	Compare
Text Spotting	Total-Text	DeepSolo (ResNet-50)	F-measure (%) - No Lexicon	79.7	# 3	Compare
Text Spotting	Total-Text	DeepSolo (ViTAEv2-S, TextOCR)	F-measure (%) - Full Lexicon	89.6	# 1	Compare
Text Spotting	Total-Text	DeepSolo (ViTAEv2-S, TextOCR)	F-measure (%) - No Lexicon	83.6	# 1	Compare
Text Spotting	Total-Text	DeepSolo (ResNet-50, TextOCR)	F-measure (%) - Full Lexicon	88.7	# 2	Compare
Text Spotting	Total-Text	DeepSolo (ResNet-50, TextOCR)	F-measure (%) - No Lexicon	82.5	# 2	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove