Open Vocabulary Action Recognition
3 papers with code • 2 benchmarks • 2 datasets
Open Vocabulary Action Recognition (OVAR) aims to generalize beyond the predefined set of actions seen during training. The actions (verbs or verb-object pairs) are provided as textual queries during inference and no prior knowledge about them is assumed to be known during training.
Most implemented papers
Opening the Vocabulary of Egocentric Actions
Given a set of verbs and objects observed during training, the goal is to generalize the verbs to an open vocabulary of actions with seen and novel objects.
FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition
To address these issues, FROSTER employs a residual feature distillation approach to ensure that CLIP retains its generalization capability while effectively adapting to the action recognition task.
Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition
To answer this, we establish a CROSS-domain Open-Vocabulary Action recognition benchmark named XOV-Action, and conduct a comprehensive evaluation of five state-of-the-art CLIP-based video learners under various types of domain gaps.