Visual Madlibs is a dataset consisting of 360,001 focused natural language descriptions for 10,738 images. This dataset is collected using automatically produced fill-in-the-blank templates designed to gather targeted descriptions about: people and objects, their appearances, activities, and interactions, as well as inferences about the general scene or its broader context.

Source: Visual Madlibs: Fill in the blank Image Generation and Question Answering

Papers


Paper Code Results Date

Tasks


Similar Datasets


Source: Yu et al.

License


  • Unknown

Modalities


Languages