114 papers with code • 1 benchmarks • 23 datasets
We present a neural network model - based on CNNs, RNNs and a novel attention mechanism - which achieves 84. 2% accuracy on the challenging French Street Name Signs (FSNS) dataset, significantly outperforming the previous state of the art (Smith'16), which achieved 72. 46%.
Ranked #1 on Optical Character Recognition on FSNS - Test
We describe efforts to adapt the Tesseract open source OCR engine for multiple scripts and languages.
Meanwhile, several pre-trained models for the Chinese and English recognition are released, including a text detector (97K images are used), a direction classifier (600K images are used) as well as a text recognizer (17. 9M images are used).
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories.
Ranked #1 on Zero-Shot Transfer Image Classification on aYahoo
We introduce the French Street Name Signs (FSNS) Dataset consisting of more than a million images of street name signs cropped from Google Street View images of France.
Ranked #3 on Optical Character Recognition on FSNS - Test
We empirically demonstrate that the proposed approach achieves competitive performance on various challenging semantic segmentation benchmarks: Cityscapes, ADE20K, LIP, PASCAL-Context, and COCO-Stuff.
Ranked #2 on Semantic Segmentation on LIP val
Despite the large number of both commercial and academic methods for Automatic License Plate Recognition (ALPR), most existing approaches are focused on a specific license plate (LP) region (e. g. European, US, Brazilian, Taiwanese, etc.
Ranked #2 on License Plate Recognition on AOLP-RP
We present a neural encoder-decoder model to convert images into presentational markup based on a scalable coarse-to-fine attention mechanism.