The zero-shot action recognition aims to recognize unseen/novel action categories by exploiting semantics and seen/known actions
|Trend||Dataset||Best Method||Paper title||Paper||Code||Compare|
The main contribution of this method resides in a new design principle that learns graph-laplacians as convex combinations of other elementary laplacians each one dedicated to a particular topology of the input graphs.
With this model we generate a diverse, realistic, and physically plausible dataset of human action videos, called PHAV for "Procedural Human Action Videos".
Motivated by the often distinctive temporal characteristics of actions in either horizontal or vertical direction, we introduce a novel convolution block for CNN architectures with video input.
Automatic detection of individual intake gestures during eating occasions has the potential to improve dietary monitoring and support dietary recommendations.
Existing methods for infrared action recognition are either based on spatial or local temporal information, however, the global temporal information, which can better describe the movements of body parts across the whole video, is not considered.
We also explore the possibility of semantically imperceptible localized attacks with CIASA, and succeed in fooling the state-of-the-art skeleton action recognition models with high confidence.
Zero-Shot Action Recognition has attracted attention in the last years, and many approaches have been proposed for recognition of objects, events, and actions in images and videos.
We also introduce a novel pooling operator, on graphs, that proceeds in two steps: context-dependent node expansion is achieved, followed by a global average pooling; the strength of this two-step process resides in its ability to preserve the discrimination power of nodes while achieving permutation invariance.
#2 best model for Skeleton Based Action Recognition on SBU