Few-Shot action recognition
24 papers with code • 0 benchmarks • 0 datasets
Benchmarks
These leaderboards are used to track progress in Few-Shot action recognition
Most implemented papers
Action Genome: Actions as Composition of Spatio-temporal Scene Graphs
Next, by decomposing and learning the temporal changes in visual relationships that result in an action, we demonstrate the utility of a hierarchical event decomposition by enabling few-shot action recognition, achieving 42. 7% mAP using as few as 10 examples.
Temporal-Relational CrossTransformers for Few-Shot Action Recognition
We propose a novel approach to few-shot action recognition, finding temporally-corresponding frame tuples between the query and videos in the support set.
Few-shot Action Recognition with Permutation-invariant Attention
Such encoded blocks are aggregated by permutation-invariant pooling to make our approach robust to varying action lengths and long-range temporal dependencies whose patterns are unlikely to repeat even in clips of the same class.
Few-shot Action Recognition with Prototype-centered Attentive Learning
Extensive experiments on four standard few-shot action benchmarks show that our method clearly outperforms previous state-of-the-art methods, with the improvement particularly significant (10+\%) on the most challenging fine-grained action recognition benchmark.
Home Action Genome: Cooperative Compositional Action Understanding
However, there remains a lack of studies that extend action composition and leverage multiple viewpoints and multiple modalities of data for representation learning.
TA2N: Two-Stage Action Alignment Network for Few-shot Action Recognition
The first stage locates the action by learning a temporal affine transform, which warps each video feature to its action duration while dismissing the action-irrelevant feature (e. g. background).
A New Split for Evaluating True Zero-Shot Action Recognition
We benchmark several recent approaches on the proposed True Zero-Shot(TruZe) Split for UCF101 and HMDB51, with zero-shot and generalized zero-shot evaluation.
Temporal Alignment Prediction for Supervised Representation Learning and Few-Shot Sequence Classification
Explainable distances for sequence data depend on temporal alignment to tackle sequences with different lengths and local variances.
Object-Region Video Transformers
In this work, we present Object-Region Video Transformers (ORViT), an \emph{object-centric} approach that extends video transformer layers with a block that directly incorporates object representations.
Revisiting spatio-temporal layouts for compositional action recognition
Recognizing human actions is fundamentally a spatio-temporal reasoning problem, and should be, at least to some extent, invariant to the appearance of the human and the objects involved.