TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Scene Text Recognition	CUTE80	CDistNet (Ours)	Accuracy	89.58	# 18
Scene Text Recognition	ICDAR2013	CDistNet (Ours)	Accuracy	97.67	# 14
Scene Text Recognition	ICDAR2015	CDistNet (Ours)	Accuracy	86.25	# 12
Scene Text Recognition	IIIT5k	CDistNet (Ours)	Accuracy	96.57	# 16
Scene Text Recognition	SVT	CDistNet (Ours)	Accuracy	93.82	# 19
Scene Text Recognition	SVTP	CDistNet (Ours)	Accuracy	89.77	# 15

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cdistnet-perceiving-multi-domain-character/scene-text-recognition-on-icdar2015)](https://paperswithcode.com/sota/scene-text-recognition-on-icdar2015?p=cdistnet-perceiving-multi-domain-character)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cdistnet-perceiving-multi-domain-character/scene-text-recognition-on-icdar2013)](https://paperswithcode.com/sota/scene-text-recognition-on-icdar2013?p=cdistnet-perceiving-multi-domain-character)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cdistnet-perceiving-multi-domain-character/scene-text-recognition-on-svtp)](https://paperswithcode.com/sota/scene-text-recognition-on-svtp?p=cdistnet-perceiving-multi-domain-character)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cdistnet-perceiving-multi-domain-character/scene-text-recognition-on-iiit5k)](https://paperswithcode.com/sota/scene-text-recognition-on-iiit5k?p=cdistnet-perceiving-multi-domain-character)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cdistnet-perceiving-multi-domain-character/scene-text-recognition-on-cute80)](https://paperswithcode.com/sota/scene-text-recognition-on-cute80?p=cdistnet-perceiving-multi-domain-character)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cdistnet-perceiving-multi-domain-character/scene-text-recognition-on-svt)](https://paperswithcode.com/sota/scene-text-recognition-on-svt?p=cdistnet-perceiving-multi-domain-character)`

CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition

22 Nov 2021 · Tianlun Zheng, Zhineng Chen, Shancheng Fang, Hongtao Xie, Yu-Gang Jiang ·

The Transformer-based encoder-decoder framework is becoming popular in scene text recognition, largely because it naturally integrates recognition clues from both visual and semantic domains. However, recent studies show that the two kinds of clues are not always well registered and therefore, feature and character might be misaligned in difficult text (e.g., with a rare shape). As a result, constraints such as character position are introduced to alleviate this problem. Despite certain success, visual and semantic are still separately modeled and they are merely loosely associated. In this paper, we propose a novel module called Multi-Domain Character Distance Perception (MDCDP) to establish a visually and semantically related position embedding. MDCDP uses the position embedding to query both visual and semantic features following the cross-attention mechanism. The two kinds of clues are fused into the position branch, generating a content-aware embedding that well perceives character spacing and orientation variants, character semantic affinities, and clues tying the two kinds of information. They are summarized as the multi-domain character distance. We develop CDistNet that stacks multiple MDCDPs to guide a gradually precise distance modeling. Thus, the feature-character alignment is well built even various recognition difficulties are presented. We verify CDistNet on ten challenging public datasets and two series of augmented datasets created by ourselves. The experiments demonstrate that CDistNet performs highly competitively. It not only ranks top-tier in standard benchmarks, but also outperforms recent popular methods by obvious margins on real and augmented datasets presenting severe text deformation, poor linguistic support, and rare character layouts. Code is available at https://github.com/simplify23/CDistNet.

PDF Abstract

Code

Add Remove Mark official

simplify23/CDistNet official

106

chibohe/CdistNet-pytorch

Tasks

Add Remove

Position

Scene Text Recognition

Datasets

ICDAR 2013 ICDAR 2015

SVT CUTE80 SVTP

IIIT5k MLT17

Results from the Paper

Edit

Ranked #12 on Scene Text Recognition on ICDAR2015

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Scene Text Recognition	CUTE80	CDistNet (Ours)	Accuracy	89.58	# 18	Compare
Scene Text Recognition	ICDAR2013	CDistNet (Ours)	Accuracy	97.67	# 14	Compare
Scene Text Recognition	ICDAR2015	CDistNet (Ours)	Accuracy	86.25	# 12	Compare
Scene Text Recognition	IIIT5k	CDistNet (Ours)	Accuracy	96.57	# 16	Compare
Scene Text Recognition	SVT	CDistNet (Ours)	Accuracy	93.82	# 19	Compare
Scene Text Recognition	SVTP	CDistNet (Ours)	Accuracy	89.77	# 15	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove