Action Localization
136 papers with code • 0 benchmarks • 3 datasets
Action Localization is finding the spatial and temporal co ordinates for an action in a video. An action localization model will identify which frame an action start and ends in video and return the x,y coordinates of an action. Further the co ordinates will change when the object performing action undergoes a displacement.
Benchmarks
These leaderboards are used to track progress in Action Localization
Libraries
Use these libraries to find Action Localization models and implementationsLatest papers
Boosting Positive Segments for Weakly-Supervised Audio-Visual Video Parsing
To address this, we focus on improving the proportion of positive segments detected in a video.
Anchor-free temporal action localization via Progressive Boundary-aware Boosting
The TCM aggregates the temporal context information and provides features for the IBM and the FPBM.
Dilation-Erosion for Single-Frame Supervised Temporal Action Localization
To balance the annotation labor and the granularity of supervision, single-frame annotation has been introduced in temporal action localization.
Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization
Temporal action localization (TAL) requires long-form reasoning to predict actions of various durations and complex content.
ReLER@ZJU Submission to the Ego4D Moment Queries Challenge 2022
Moreover, in order to better capture the long-term temporal dependencies in the long videos, we propose a segment-level recurrence mechanism.
Where a Strong Backbone Meets Strong Features -- ActionFormer for Ego4D Moment Queries Challenge
This report describes our submission to the Ego4D Moment Queries Challenge 2022.
A Simple Transformer-Based Model for Ego4D Natural Language Queries Challenge
This report describes Badgers@UW-Madison, our submission to the Ego4D Natural Language Queries (NLQ) Challenge.
Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in Temporal Action Localization Tasks
Temporal Action Localization (TAL) methods typically operate on top of feature sequences from a frozen snippet encoder that is pretrained with the Trimmed Action Classification (TAC) tasks, resulting in a task discrepancy problem.
SimOn: A Simple Framework for Online Temporal Action Localization
In addition, the evaluation for Online Detection of Action Start (ODAS) demonstrates the effectiveness and robustness of our method in the online setting.
EgoTaskQA: Understanding Human Tasks in Egocentric Videos
The challenges of such capability lie in the difficulty of generating a detailed understanding of situated actions, their effects on object states (i. e., state changes), and their causal dependencies.