Spatio-Temporal Action Localization
13 papers with code • 1 benchmarks • 5 datasets
We propose to explicitly model the Actor-Context-Actor Relation, which is the relation between two actors based on their interactions with the context.
We propose the ACtion Tubelet detector (ACT-detector) that takes as input a sequence of frames and outputs tubelets, i. e., sequences of bounding boxes with associated scores.
This technical report introduces our winning solution to the spatio-temporal action localization track, AVA-Kinetics Crossover, in ActivityNet Challenge 2020.
Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection
In this paper, we propose a network architecture that computes and integrates the most important visual cues for action recognition: pose, motion, and the raw images.
Detecting human-object interactions (HOI) is an important step toward a comprehensive visual understanding of machines.
Despite the simplicity of our approach, our lightweight end-to-end architecture achieves state-of-the-art frame-mAP of 74. 7% on the challenging UCF101-24 dataset, demonstrating a performance gain of 6. 4% over the previous best online methods.
Modern self-supervised learning algorithms typically enforce persistency of instance representations across views.
Video action detection (spatio-temporal action localization) is usually the starting point for human-centric intelligent analysis of videos nowadays.