ViPhy leverages two datasets: Visual Genome (Krishna et al., 2017), and ADE20K (Zhou et al., 2017). The dense captions in Visual Genome provide a broad coverage of object classes, making it a suitable resource for collecting subtype candidates. For extracting hyponyms from knowledge base, we acquire "is-a" relations from ConceptNet (Speer et al., 2017), and augment the subtype candidate set. We extract spatial relations from ADE20K, as it provides images categorised by scene type – primarily indoor environments with high object density: {bedroom, bathroom, kitchen, living room, office}.
Paper | Code | Results | Date | Stars |
---|