Grounded Situation Recognition

We introduce Grounded Situation Recognition (GSR), a task that requires producing structured semantic summaries of images describing: the primary activity, entities engaged in the activity with their roles (e.g. agent, tool), and bounding-box groundings of entities. GSR presents important technical challenges: identifying semantic saliency, categorizing and localizing a large and diverse set of entities, overcoming semantic sparsity, and disambiguating roles. Moreover, unlike in captioning, GSR is straightforward to evaluate. To study this new task we create the Situations With Groundings (SWiG) dataset which adds 278,336 bounding-box groundings to the 11,538 entity classes in the imsitu dataset. We propose a Joint Situation Localizer and find that jointly predicting situations and groundings with end-to-end training handily outperforms independent training on the entire grounding metric suite with relative gains between 8% and 32%. Finally, we show initial findings on three exciting future directions enabled by our models: conditional querying, visual chaining, and grounded semantic aware image retrieval. Code and data available at https://prior.allenai.org/projects/gsr.

PDF Abstract ECCV 2020 PDF ECCV 2020 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Situation Recognition imSitu JSL Top-1 Verb 39.94 # 6
Top-1 Verb & Value 31.44 # 5
Top-5 Verbs 67.6 # 6
Top-5 Verbs & Value 51.88 # 6
Situation Recognition imSitu ISL Top-1 Verb 39.36 # 7
Top-1 Verb & Value 30.09 # 7
Top-5 Verbs 65.51 # 7
Top-5 Verbs & Value 50.16 # 8
Grounded Situation Recognition SWiG ISL Top-1 Verb 39.36 # 7
Top-1 Verb & Value 30.09 # 8
Top-1 Verb & Grounded-Value 22.73 # 6
Top-5 Verbs 65.51 # 7
Top-5 Verbs & Value 50.16 # 8
Top-5 Verbs & Grounded-Value 36.6 # 6
Grounded Situation Recognition SWiG JSL Top-1 Verb 39.94 # 6
Top-1 Verb & Value 31.44 # 6
Top-1 Verb & Grounded-Value 24.86 # 5
Top-5 Verbs 67.6 # 6
Top-5 Verbs & Value 51.88 # 6
Top-5 Verbs & Grounded-Value 40.6 # 5

Methods


No methods listed for this paper. Add relevant methods here