Pose-conditioned Spatio-Temporal Attention for Human Action Recognition

We address human action recognition from multi-modal video data involving articulated pose and RGB frames and propose a two-stream approach. The pose stream is processed with a convolutional model taking as input a 3D tensor holding data from a sub-sequence... (read more)

Results in Papers With Code
(↓ scroll down to see all results)