…Filtering images The first step is focused on filtering images that have meaningful scene graphs and captions. We filtered all the scene graphs that did not contain any edges. images pass this filter. The relationships should be verbs and not contain nouns or pronouns. We filter all scene graphs that contain an edge not tagged as a verb or that the tag is not in an ad-hoc list of allowed non-verb keywords.
1 PAPER • 2 BENCHMARKS
…The datasets are generated by repurposing the Visual Genome scene graphs and region descriptions and applying handcrafted templates and GPT-3.
2 PAPERS • 1 BENCHMARK