COCO-Text

Introduced by Veit et al. in COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images

The COCO-Text dataset is a dataset for text detection and recognition. It is based on the MS COCO dataset, which contains images of complex everyday scenes. The COCO-Text dataset contains non-text images, legible text images and illegible text images. In total there are 22184 training images and 7026 validation images with at least one instance of legible text.

Source: Improving Text Proposals for Scene Images with Fully Convolutional Networks

Homepage