VQD: Visual Query Detection in Natural Scenes

We propose Visual Query Detection (VQD), a new visual grounding task. In VQD, a system is guided by natural language to localize a variable number of objects in an image. VQD is related to visual referring expression recognition, where the task is to localize only one object. We describe the first dataset for VQD and we propose baseline algorithms that demonstrate the difficulty of the task compared to referring expression recognition.

PDF Abstract NAACL 2019 PDF NAACL 2019 Abstract

Datasets


Introduced in the Paper:

VQDv1

Used in the Paper:

MS COCO Visual Question Answering RefCOCO
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Referring Expression Comprehension VQDv1 Vision+Query AP@0.5 31.03 # 1

Methods


No methods listed for this paper. Add relevant methods here