Action Localization
166 papers with code • 0 benchmarks • 4 datasets
Action Localization is finding the spatial and temporal co ordinates for an action in a video. An action localization model will identify which frame an action start and ends in video and return the x,y coordinates of an action. Further the co ordinates will change when the object performing action undergoes a displacement.
Benchmarks
These leaderboards are used to track progress in Action Localization
Libraries
Use these libraries to find Action Localization models and implementationsSubtasks
Most implemented papers
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.
You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization
YOWO is a single-stage architecture with two branches to extract temporal and spatial information concurrently and predict bounding boxes and action probabilities directly from video clips in one evaluation.
ACGNet: Action Complement Graph Network for Weakly-supervised Temporal Action Localization
Weakly-supervised temporal action localization (WTAL) in untrimmed videos has emerged as a practical but challenging task since only video-level labels are available.
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
In this work, we propose instead to learn such embeddings from video data with readily available natural language annotations in the form of automatically transcribed narrations.
End-to-End Learning of Visual Representations from Uncurated Instructional Videos
Annotating videos is cumbersome, expensive and not scalable.
Recognition of Instrument-Tissue Interactions in Endoscopic Videos via Action Triplets
Recognition of surgical activity is an essential component to develop context-aware decision support for the operating room.
Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization
We propose `Hide-and-Seek', a weakly-supervised framework that aims to improve object localization in images and action localization in videos.
Weakly Supervised Action Localization by Sparse Temporal Pooling Network
We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks.
Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization
We propose to explicitly model the Actor-Context-Actor Relation, which is the relation between two actors based on their interactions with the context.
1st place solution for AVA-Kinetics Crossover in AcitivityNet Challenge 2020
This technical report introduces our winning solution to the spatio-temporal action localization track, AVA-Kinetics Crossover, in ActivityNet Challenge 2020.