In this paper, we investigate the problem of scene text recognition, which is among the most important and challenging tasks in image-based sequence recognition.
In this paper, we present an end-to-end trainable fast scene text detector, named TextBoxes++, which detects arbitrary-oriented scene text with both high accuracy and efficiency in a single network forward pass.
In contrast to most existing works that consist of multiple deep neural networks and several pre-processing steps we propose to use a single deep neural network that learns to detect and recognize text from natural images in a semi-supervised way.
SCENE text recognition has attracted great interest from the academia and the industry in recent years owing to its importance in a wide range of applications.
Text in curve orientation, despite being one of the common text orientations in real world environment, has close to zero existence in well received scene text datasets such as ICDAR2013 and MSRA-TD500.
It decreases the difficulty of recognition and enables the attention-based sequence recognition network to more easily read irregular text.
Many new proposals for scene text recognition (STR) models have been introduced in recent years.
Connectionist Temporal Classification (CTC) is an objective function for end-to-end sequence learning, which adopts dynamic programming algorithms to directly learn the mapping between sequences.