TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Situation Recognition	imSitu	GSRTR	Top-1 Verb	40.63	# 5
Situation Recognition	imSitu	GSRTR	Top-1 Verb & Value	32.15	# 4
Situation Recognition	imSitu	GSRTR	Top-5 Verbs	69.81	# 4
Situation Recognition	imSitu	GSRTR	Top-5 Verbs & Value	54.13	# 5
Grounded Situation Recognition	SWiG	GSRTR	Top-1 Verb	40.63	# 5
Grounded Situation Recognition	SWiG	GSRTR	Top-1 Verb & Value	32.15	# 5
Grounded Situation Recognition	SWiG	GSRTR	Top-1 Verb & Grounded-Value	25.49	# 4
Grounded Situation Recognition	SWiG	GSRTR	Top-5 Verbs	69.81	# 4
Grounded Situation Recognition	SWiG	GSRTR	Top-5 Verbs & Value	54.13	# 5
Grounded Situation Recognition	SWiG	GSRTR	Top-5 Verbs & Grounded-Value	42.5	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/grounded-situation-recognition-with/situation-recognition-on-imsitu)](https://paperswithcode.com/sota/situation-recognition-on-imsitu?p=grounded-situation-recognition-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/grounded-situation-recognition-with/grounded-situation-recognition-on-swig)](https://paperswithcode.com/sota/grounded-situation-recognition-on-swig?p=grounded-situation-recognition-with)`

Grounded Situation Recognition with Transformers

19 Nov 2021 · Junhyeong Cho, Youngseok Yoon, Hyeonjun Lee, Suha Kwak ·

Grounded Situation Recognition (GSR) is the task that not only classifies a salient action (verb), but also predicts entities (nouns) associated with semantic roles and their locations in the given image. Inspired by the remarkable success of Transformers in vision tasks, we propose a GSR model based on a Transformer encoder-decoder architecture. The attention mechanism of our model enables accurate verb classification by capturing high-level semantic feature of an image effectively, and allows the model to flexibly deal with the complicated and image-dependent relations between entities for improved noun classification and localization. Our model is the first Transformer architecture for GSR, and achieves the state of the art in every evaluation metric on the SWiG benchmark. Our code is available at https://github.com/jhcho99/gsrtr .

PDF Abstract

Code

Add Remove Mark official

jhcho99/gsrtr official

Tasks

Add Remove

Grounded Situation Recognition

Image Classification

Object Detection

Scene Understanding

Situation Recognition

Visual Grounding

Visual Reasoning

Datasets

ImageNet

FrameNet

Results from the Paper

Edit

Ranked #5 on Situation Recognition on imSitu

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Situation Recognition	imSitu	GSRTR	Top-1 Verb	40.63	# 5	Compare
			Top-1 Verb & Value	32.15	# 4	Compare
			Top-5 Verbs	69.81	# 4	Compare
			Top-5 Verbs & Value	54.13	# 5	Compare
Grounded Situation Recognition	SWiG	GSRTR	Top-1 Verb	40.63	# 5	Compare
			Top-1 Verb & Value	32.15	# 5	Compare
			Top-1 Verb & Grounded-Value	25.49	# 4	Compare
			Top-5 Verbs	69.81	# 4	Compare
			Top-5 Verbs & Value	54.13	# 5	Compare
			Top-5 Verbs & Grounded-Value	42.5	# 4	Compare

Methods

Add Remove

Adam • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Grounded Situation Recognition with Transformers

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove