On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering

Visual Question Answering (VQA) methods have made incredible progress, but suffer from a failure to generalize. This is visible in the fact that they are vulnerable to learning coincidental correlations in the data rather than deeper relations between image content and ideas expressed in language... (read more)

Results in Papers With Code
(↓ scroll down to see all results)