BEYOND SUPERVISED LEARNING: RECOGNIZING UNSEEN ATTRIBUTE-OBJECT PAIRS WITH VISION-LANGUAGE FUSION AND ATTRACTOR NETWORKS

ICLR 2020 · Hui Chen, Zhixiong Nan, Nanning Zheng ·

This paper handles a challenging problem, unseen attribute-object pair recognition, which asks a model to simultaneously recognize the attribute type and the object type of a given image while this attribute-object pair is not included in the training set. In the past years, the conventional classifier-based methods, which recognize unseen attribute-object pairs by composing separately-trained attribute classifiers and object classifiers, are strongly frustrated. Different from conventional methods, we propose a generative model with a visual pathway and a linguistic pathway. In each pathway, the attractor network is involved to learn the intrinsic feature representation to explore the inner relationship between the attribute and the object. With the learned features in both pathways, the unseen attribute-object pair is recognized by finding out the pair whose linguistic feature closely matches the visual feature of the given image. On two public datasets, our model achieves impressive experiment results, notably outperforming the state-of-the-art methods.

PDF Abstract