TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	EXTRA DATA	REMOVE
Action Classification	Charades	STRG	MAP	39.7	# 34
Action Recognition	Something-Something V1	NL I3D + GCN	Top 1 Accuracy	46.1	# 67

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/videos-as-space-time-region-graphs/action-classification-on-charades)](https://paperswithcode.com/sota/action-classification-on-charades?p=videos-as-space-time-region-graphs)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/videos-as-space-time-region-graphs/action-recognition-in-videos-on-something-1)](https://paperswithcode.com/sota/action-recognition-in-videos-on-something-1?p=videos-as-space-time-region-graphs)`

Videos as Space-Time Region Graphs

ECCV 2018 · Xiaolong Wang, Abhinav Gupta ·

How do humans recognize the action "opening a book" ? We argue that there are two important cues: modeling temporal shape dynamics and modeling functional relationships between humans and objects. In this paper, we propose to represent videos as space-time region graphs which capture these two important cues. Our graph nodes are defined by the object region proposals from different frames in a long range video. These nodes are connected by two types of relations: (i) similarity relations capturing the long range dependencies between correlated objects and (ii) spatial-temporal relations capturing the interactions between nearby objects. We perform reasoning on this graph representation via Graph Convolutional Networks. We achieve state-of-the-art results on both Charades and Something-Something datasets. Especially for Charades, we obtain a huge 4.4% gain when our model is applied in complex environments.