Few Shot Action Recognition

25 papers with code • 4 benchmarks • 5 datasets

Few-shot (FS) action recognition is a challenging com- puter vision problem, where the task is to classify an unlabelled query video into one of the action categories in the support set having limited samples per action class.

Most implemented papers

Temporal-Relational CrossTransformers for Few-Shot Action Recognition

tobyperrett/trx CVPR 2021

We propose a novel approach to few-shot action recognition, finding temporally-corresponding frame tuples between the query and videos in the support set.

Action Genome: Actions as Composition of Spatio-temporal Scene Graphs

mcg-nju/trace 15 Dec 2019

Next, by decomposing and learning the temporal changes in visual relationships that result in an action, we demonstrate the utility of a hierarchical event decomposition by enabling few-shot action recognition, achieving 42. 7% mAP using as few as 10 examples.

Few-shot Action Recognition with Permutation-invariant Attention

Teddy00888/arn_mindspore ECCV 2020

Such encoded blocks are aggregated by permutation-invariant pooling to make our approach robust to varying action lengths and long-range temporal dependencies whose patterns are unlikely to repeat even in clips of the same class.

Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition

lovelyqian/AMeFu-Net 20 Oct 2020

Humans can easily recognize actions with only a few examples given, while the existing video recognition models still heavily rely on the large-scale labeled data inputs.

Few-shot Action Recognition with Prototype-centered Attentive Learning

tobyperrett/few-shot-action-recognition 20 Jan 2021

Extensive experiments on four standard few-shot action benchmarks show that our method clearly outperforms previous state-of-the-art methods, with the improvement particularly significant (10+\%) on the most challenging fine-grained action recognition benchmark.

Home Action Genome: Cooperative Compositional Action Understanding

nishantrai18/homage CVPR 2021

However, there remains a lack of studies that extend action composition and leverage multiple viewpoints and multiple modalities of data for representation learning.

TA2N: Two-Stage Action Alignment Network for Few-shot Action Recognition

R00Kie-Liu/TA2N 10 Jul 2021

The first stage locates the action by learning a temporal affine transform, which warps each video feature to its action duration while dismissing the action-irrelevant feature (e. g. background).

A New Split for Evaluating True Zero-Shot Action Recognition

kini5gowda/TruZe 27 Jul 2021

We benchmark several recent approaches on the proposed True Zero-Shot(TruZe) Split for UCF101 and HMDB51, with zero-shot and generalized zero-shot evaluation.

Temporal Alignment Prediction for Supervised Representation Learning and Few-Shot Sequence Classification

BingSu12/TAP ICLR 2022

Explainable distances for sequence data depend on temporal alignment to tackle sequences with different lengths and local variances.

Object-Region Video Transformers

eladb3/orvit CVPR 2022

In this work, we present Object-Region Video Transformers (ORViT), an \emph{object-centric} approach that extends video transformer layers with a block that directly incorporates object representations.