SME (Standard Multimodal Explanation)

Introduced by Xue et al. in Few-Shot Multimodal Explanation for Visual Question Answering

SME is a new dataset for Multi-modal Explanation for Visual Question Answering comprising 1,028,230 samples, with 1,656 visual objects requiring detection in explanations. To our knowledge, this is the first dataset where the explanations are in standard English with additional visual grounding tokens.

Papers


Paper Code Results Date Stars

Dataset Loaders


Tasks


Similar Datasets


License


Modalities


Languages