Action Localization
135 papers with code • 0 benchmarks • 3 datasets
Action Localization is finding the spatial and temporal co ordinates for an action in a video. An action localization model will identify which frame an action start and ends in video and return the x,y coordinates of an action. Further the co ordinates will change when the object performing action undergoes a displacement.
Benchmarks
These leaderboards are used to track progress in Action Localization
Libraries
Use these libraries to find Action Localization models and implementationsLatest papers
Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal Action Localization
Weakly-supervised temporal action localization aims to localize and recognize actions in untrimmed videos with only video-level category labels during training.
Boosting Weakly-Supervised Temporal Action Localization with Text Information
For the discriminative objective, we propose a Text-Segment Mining (TSM) mechanism, which constructs a text description based on the action class label, and regards the text as the query to mine all class-related segments.
Weakly-Supervised Temporal Action Localization with Bidirectional Semantic Consistency Constraint
The proposed Bi-SCC firstly adopts a temporal context augmentation to generate an augmented video that breaks the correlation between positive actions and their co-scene actions in the inter-video; Then, a semantic consistency constraint (SCC) is used to enforce the predictions of the original video and augmented video to be consistent, hence suppressing the co-scene actions.
Improving Weakly Supervised Temporal Action Localization by Bridging Train-Test Gap in Pseudo Labels
Besides, the generated pseudo-labels can be fluctuating and inaccurate at the early stage of training.
WEAR: An Outdoor Sports Dataset for Wearable and Egocentric Activity Recognition
Though research has shown the complementarity of camera- and inertial-based data, datasets which offer both egocentric video and inertial-based sensor data remain scarce.
TemporalMaxer: Maximize Temporal Context with only Max Pooling for Temporal Action Localization
To this end, we introduce TemporalMaxer, which minimizes long-term temporal context modeling while maximizing information from the extracted video clip features with a basic, parameter-free, and local region operating max-pooling block.
Faster Learning of Temporal Action Proposal via Sparse Multilevel Boundary Generator
Temporal action localization in videos presents significant challenges in the field of computer vision.
Chaotic World: A Large and Challenging Benchmark for Human Behavior Understanding in Chaotic Events
Understanding and analyzing human behaviors (actions and interactions of people), voices, and sounds in chaotic events is crucial in many applications, e. g., crowd management, emergency response services.
Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Localization
To this end, we propose a unified framework, termed Noisy Pseudo-Label Learning, to handle both location biases and category errors.
Boosting Positive Segments for Weakly-Supervised Audio-Visual Video Parsing
To address this, we focus on improving the proportion of positive segments detected in a video.