Modern image captioning system relies heavily on extracting knowledge from images to capture the concept of a static story.
In this work, we focus on improving the captions generated by image-caption generation systems.
In this paper, we propose a visual context dataset for Text Spotting in the wild, where the publicly available dataset COCO-text [Veit et al. 2016] has been extended with information about the scene (such as objects and places appearing in the image) to enable researchers to include semantic relations between texts and scene in their Text Spotting systems, and to offer a common framework for such approaches.
We present a scenario where semantic similarity is not enough, and we devise a neural approach to learn semantic relatedness.
We propose a post-processing approach to improve scene text recognition accuracy by using occurrence probabilities of words (unigram language model), and the semantic correlation between scene and text.
In this paper, we propose a post-processing approach to improve the accuracy of text spotting by using the semantic relation between the text and the scene.