We proposed a template-based image captioning approach for context modelling to create text-based contextual information from the heatmap and input data.
High-quality saliency maps are essential in several machine learning application areas including explainable AI and weakly supervised object detection and segmentation.
However, R-MAC suffers in the presence of background clutter/trivial regions and scale variance, and discards important spatial information.
Recently, Zero-shot Sketch-based Image Retrieval (ZS-SBIR) has attracted the attention of the computer vision community due to it's real-world applications, and the more realistic and challenging setting than found in SBIR.
However, their invariance to target data is pre-defined by the network architecture and training data.
The results of ablation studies demonstrate that the proposed multi-branch architecture with attention blocks is effective and essential.
Developing such a generic text eraser for real scenes is a challenging task, since it inherits all the challenges of multi-lingual and curved text detection and inpainting.
The demand for large-scale trademark retrieval (TR) systems has significantly increased to combat the rise in international trademark infringement.
In this paper, we provide a large-scale dataset with benchmark queries with which different TR approaches can be evaluated systematically.