…Specifically, that authors construct a dialog grammar that is grounded in the scene graphs of the images from the CLEVR dataset.
10 PAPERS • NO BENCHMARKS YET