A Shared Multi-Attention Framework for Multi-Label Zero-Shot Learning

CVPR 2020  ·  Dat Huynh, Ehsan Elhamifar ·

In this work, we develop a shared multi-attention model for multi-label zero-shot learning. We argue that designing attention mechanism for recognizing multiple seen and unseen labels in an image is a non-trivial task as there is no training signal to localize unseen labels and an image only contains a few present labels that need attentions out of thousands of possible labels. Therefore, instead of generating attentions for unseen labels which have unknown behaviors and could focus on irrelevant regions due to the lack of any training sample, we let the unseen labels select among a set of shared attentions which are trained to be label-agnostic and to focus on only relevant/foreground regions through our novel loss. Finally, we learn a compatibility function to distinguish labels based on the selected attention. We further propose a novel loss function that consists of three components guiding the attention to focus on diverse and relevant image regions while utilizing all attention features. By extensive experiments, we show that our method improves the state of the art by 2.9% and 1.4% F1 score on the NUS-WIDE and the large scale Open Images datasets, respectively.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Multi-label zero-shot learning NUS-WIDE LESA mAP 19.4 # 9
Multi-label zero-shot learning Open Images V4 LESA MAP 41.7 # 4
Multi-label zero-shot learning Open Images V4 Attention per cluster MAP 40.7 # 6

Methods


No methods listed for this paper. Add relevant methods here