Semantic reasoning network, or SRN, is an end-to-end trainable framework for scene text recognition that consists of four parts: backbone network, parallel visual attention module (PVAM), global semantic reasoning module (GSRM), and visual-semantic fusion decoder (VSFD). Given an input image, the backbone network is first used to extract 2D features $V$. Then, the PVAM is used to generate $N$ aligned 1-D features $G$, where each feature corresponds to a character in the text and captures the aligned visual information. These $N$ 1-D features $G$ are then fed into a GSRM to capture the semantic information $S$. Finally, the aligned visual features $G$ and the semantic information $S$ are fused by the VSFD to predict $N$ characters. For text string shorter than $N$, ’EOS’ are padded.
Source: Towards Accurate Scene Text Recognition with Semantic Reasoning NetworksPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Object Detection | 1 | 14.29% |
Optical Flow Estimation | 1 | 14.29% |
Sentence | 1 | 14.29% |
Temporal Sentence Grounding | 1 | 14.29% |
Change Detection | 1 | 14.29% |
Optical Character Recognition (OCR) | 1 | 14.29% |
Scene Text Recognition | 1 | 14.29% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |