Recently, scene text recognition methods based on deep learning have sprung up in computer vision area.
Many new proposals for scene text recognition (STR) models have been introduced in recent years.
It decreases the difficulty of recognition and enables the attention-based sequence recognition network to more easily read irregular text.
Connectionist Temporal Classification (CTC) is an objective function for end-to-end sequence learning, which adopts dynamic programming algorithms to directly learn the mapping between sequences.
Recognizing irregular text in natural scene images is challenging due to the large variance in text appearance, such as curvature, orientation and distortion.
We propose a post-processing approach to improve scene text recognition accuracy by using occurrence probabilities of words (unigram language model), and the semantic correlation between scene and text.
SCENE text recognition has attracted great interest from the academia and the industry in recent years owing to its importance in a wide range of applications.
Extensive experiments on various benchmarks, including the IIIT5K, SVT and ICDAR datasets, show that NRTR achieves the state-of-the-art or highly-competitive performances in both lexicon-free and lexicon-based scene text recognition tasks, while requiring only one order of magnitude less time for model training compared to current methods.