We describe efforts to adapt the Tesseract open source OCR engine for multiple scripts and languages.
We present a neural encoder-decoder model to convert images into presentational markup based on a scalable coarse-to-fine attention mechanism.
SCENE text recognition has attracted great interest from the academia and the industry in recent years owing to its importance in a wide range of applications.
In contrast to most existing works that consist of multiple deep neural networks and several pre-processing steps we propose to use a single deep neural network that learns to detect and recognize text from natural images in a semi-supervised way.
An end-to-end trainable (fully differentiable) method for multi-language scene text localization and recognition is proposed.
Existing methods on text recognition mainly work with regular (horizontal and frontal) texts and cannot be trivially generalized to handle irregular texts.