Comparative Analysis of CNN-based Spatiotemporal Reasoning in Videos

arXiv preprint 2019 Okan KöpüklüFabian HerzogGerhard Rigoll

Understanding actions and gestures in video streams requires temporal reasoning of the spatial content from different time instants, i.e., spatiotemporal (ST) modeling. In this paper, we have made a comparative analysis of different ST modeling techniques... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Action Recognition Something-Something V2 STM + TRNMultiscale Top-1 Accuracy 47.73 # 14

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet