Action Localization
135 papers with code • 0 benchmarks • 3 datasets
Action Localization is finding the spatial and temporal co ordinates for an action in a video. An action localization model will identify which frame an action start and ends in video and return the x,y coordinates of an action. Further the co ordinates will change when the object performing action undergoes a displacement.
Benchmarks
These leaderboards are used to track progress in Action Localization
Libraries
Use these libraries to find Action Localization models and implementationsLatest papers
Test-Time Zero-Shot Temporal Action Localization
To this aim, we introduce a novel method that performs Test-Time adaptation for Temporal Action Localization (T3AL).
UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization
Video localization tasks aim to temporally locate specific instances in videos, including temporal action localization (TAL), sound event detection (SED) and audio-visual event localization (AVEL).
ASTRA: An Action Spotting TRAnsformer for Soccer Videos
In this paper, we introduce ASTRA, a Transformer-based model designed for the task of Action Spotting in soccer matches.
Revisiting Foreground and Background Separation in Weakly-supervised Temporal Action Localization: A Clustering-based Approach
It comprises two core components: a snippet clustering component that groups the snippets into multiple latent clusters and a cluster classification component that further classifies the cluster as foreground or background.
Unsupervised Temporal Action Localization via Self-paced Incremental Learning
Thereafter, we design two (constant- and variable- speed) incremental instance learning strategies for easy-to-hard model training, thus ensuring the reliability of these video pseudolabels and further improving overall localization performance.
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
We first benchmark MM-Navigator on our collected iOS screen dataset.
Temporal Action Localization with Enhanced Instant Discriminability
Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video.
HR-Pro: Point-supervised Temporal Action Localization via Hierarchical Reliability Propagation
For snippet-level learning, we introduce an online-updated memory to store reliable snippet prototypes for each class.
DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action Localization
Considering this phenomenon, we propose Discriminability-Driven Graph Network (DDG-Net), which explicitly models ambiguous snippets and discriminative snippets with well-designed connections, preventing the transmission of ambiguous information and enhancing the discriminability of snippet-level representations.
NMS Threshold matters for Ego4D Moment Queries -- 2nd place solution to the Ego4D Moment Queries Challenge 2023
This report describes our submission to the Ego4D Moment Queries Challenge 2023.