Current text recognition systems, including those for handwritten scripts and scene text, have relied heavily on image synthesis and augmentation, since it is difficult to realize real-world complexity and diversity through collecting and annotating enough real text images.
Visual information extraction (VIE) has attracted increasing attention in recent years.
Specifically, we integrate IFA into the two most prevailing text recognition streams (attention-based and CTC-based) and propose attention-guided dense prediction (ADP) and Extended CTC (ExCTC).
This paper aims to (1) summarize the fundamental problems and the state-of-the-art associated with scene text recognition; (2) introduce new insights and ideas; (3) provide a comprehensive review of publicly available resources; (4) point out directions for future work.
To remedy this issue, we propose a decoupled attention network (DAN), which decouples the alignment operation from using historical decoding results.
Ranked #4 on Scene Text Recognition on ICDAR 2003
Scene text recognition has attracted particular research interest because it is a very challenging problem and has various applications.