The goal of COCO-Text is to advance state-of-the-art in text detection and recognition in natural images.
We present a neural encoder-decoder model to convert images into presentational markup based on a scalable coarse-to-fine attention mechanism.
SCENE text recognition has attracted great interest from the academia and the industry in recent years owing to its importance in a wide range of applications.
In contrast to most existing works that consist of multiple deep neural networks and several pre-processing steps we propose to use a single deep neural network that learns to detect and recognize text from natural images in a semi-supervised way.
An end-to-end trainable (fully differentiable) method for multi-language scene text localization and recognition is proposed.
Existing methods on text recognition mainly work with regular (horizontal and frontal) texts and cannot be trivially generalized to handle irregular texts.
This paper provides the first thorough documentation of a high quality digitization process applied to an early printed book from the incunabulum period (1450-1500).