no code implementations • 29 Apr 2021 • Shravan Murlidaran, William Yang Wang, Miguel P. Eckstein
Results show that the machine/human agreement scene descriptions are much lower than human/human agreement for our complex scenes.
Sentence Visual Reasoning