Spatio-temporal action localization is a challenging yet fascinating task that aims to detect and classify human actions in video clips.
We propose the ACtion Tubelet detector (ACT-detector) that takes as input a sequence of frames and outputs tubelets, i. e., sequences of bounding boxes with associated scores.
#3 best model for Temporal Action Localization on J-HMDB-21
In this paper, we propose a network architecture that computes and integrates the most important visual cues for action recognition: pose, motion, and the raw images.
#4 best model for Skeleton Based Action Recognition on JHMDB (2D poses only)