no code implementations • 17 Apr 2022 • Xuejiao Tang, Tai Le Quy, Eirini Ntoutsi, Kea Turner, Vasile Palade, Israat Haque, Peng Xu, Chris Brown, Wenbin Zhang
Given a question-image input, the Visual Commonsense Reasoning (VCR) model can predict an answer with the corresponding rationale, which requires inference ability from the real world.
1 code implementation • 6 Aug 2021 • Xuejiao Tang, Wenbin Zhang, Yi Yu, Kea Turner, Tyler Derr, Mengyu Wang, Eirini Ntoutsi
While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge.