Semantic Reasoning Network

Introduced by Yu et al. in Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

Semantic reasoning network, or SRN, is an end-to-end trainable framework for scene text recognition that consists of four parts: backbone network, parallel visual attention module (PVAM), global semantic reasoning module (GSRM), and visual-semantic fusion decoder (VSFD). Given an input image, the backbone network is first used to extract 2D features $V$. Then, the PVAM is used to generate $N$ aligned 1-D features $G$, where each feature corresponds to a character in the text and captures the aligned visual information. These $N$ 1-D features $G$ are then fed into a GSRM to capture the semantic information $S$. Finally, the aligned visual features $G$ and the semantic information $S$ are fused by the VSFD to predict $N$ characters. For text string shorter than $N$, ’EOS’ are padded.

Source: Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Object Detection	1	14.29%
Optical Flow Estimation	1	14.29%
Sentence	1	14.29%
Temporal Sentence Grounding	1	14.29%
Change Detection	1	14.29%
Optical Character Recognition (OCR)	1	14.29%
Scene Text Recognition	1	14.29%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Scene Text Models