The zero-shot action recognition aims to recognize unseen/novel action categories by exploiting semantics and seen/known actions
|Trend||Dataset||Best Method||Paper title||Paper||Code||Compare|
For the backbone, we propose multi-branch multi-scale graph convolution networks to extract spatial and temporal features.
We demonstrate the method on six state-of-the-art 3D convolution neural networks (CNNs) on three action recognition (Kinetics-400, UCF-101, and HMDB-51) and two egocentric action recognition datasets (EPIC-Kitchens and EGTEA Gaze+).
The proposed representation has the advantage of combining the use of reference joints and a tree structure skeleton.
Human Action Recognition is an important task of Human Robot Interaction as cooperation between robots and humans requires that artificial agents recognise complex cues from the environment.
Due to the availability of large-scale skeleton datasets, 3D human action recognition has recently called the attention of computer vision community.
Diffusions effectively interact two aspects of information, i. e., localized and holistic, for more powerful way of representation learning.
The skeleton data have been widely used for the action recognition tasks since they can robustly accommodate dynamic circumstances and complex backgrounds.
Human action recognition remains as a challenging task partially due to the presence of large variations in the execution of action.
Our method is ranked first in the public leaderboard of the EPIC-Kitchens egocentric action anticipation challenge 2019.