Modern image captioning system relies heavily on extracting knowledge from images to capture the concept of a static story.
In this work, we focus on improving the captions generated by image-caption generation systems.
In his pioneering research, G. K. Zipf formulated a couple of statistical laws on the relationship between the frequency of a word with its number of meanings: the law of meaning distribution, relating the frequency of a word and its frequency rank, and the meaning-frequency law, relating the frequency of a word with its number of meanings.
In this paper, we propose a visual context dataset for Text Spotting in the wild, where the publicly available dataset COCO-text [Veit et al. 2016] has been extended with information about the scene (such as objects and places appearing in the image) to enable researchers to include semantic relations between texts and scene in their Text Spotting systems, and to offer a common framework for such approaches.
We present a scenario where semantic similarity is not enough, and we devise a neural approach to learn semantic relatedness.
We propose a post-processing approach to improve scene text recognition accuracy by using occurrence probabilities of words (unigram language model), and the semantic correlation between scene and text.
In this paper, we propose a post-processing approach to improve the accuracy of text spotting by using the semantic relation between the text and the scene.