Weakly Supervised Action Localization
26 papers with code • 8 benchmarks • 5 datasets
In this task, the training data consists of videos with a list of activities in them without any temporal boundary annotations. However, while testing, given a video, the algorithm should recognize the activities in the video and also provide the start and end time.
These leaderboards are used to track progress in Weakly Supervised Action Localization
LibrariesUse these libraries to find Weakly Supervised Action Localization models and implementations
Most implemented papers
Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization
We propose `Hide-and-Seek', a weakly-supervised framework that aims to improve object localization in images and action localization in videos.
Weakly Supervised Action Localization by Sparse Temporal Pooling Network
We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks.
Recognition of Instrument-Tissue Interactions in Endoscopic Videos via Action Triplets
Recognition of surgical activity is an essential component to develop context-aware decision support for the operating room.
UntrimmedNets for Weakly Supervised Action Recognition and Detection
We exploit the learned models for action recognition (WSR) and detection (WSD) on the untrimmed video datasets of THUMOS14 and ActivityNet.
Guess Where? Actor-Supervision for Spatiotemporal Action Localization
Second, we propose an actor-based attention mechanism that enables the localization of the actions from action class labels and actor proposals and is end-to-end trainable.
Background Suppression Network for Weakly-supervised Temporal Action Localization
This formulation does not fully model the problem in that background frames are forced to be misclassified as action classes to predict video-level labels accurately.
Weakly-supervised Temporal Action Localization by Uncertainty Modeling
Experimental results show that our uncertainty modeling is effective at alleviating the interference of background frames and brings a large performance gain without bells and whistles.
VideoMix: Rethinking Data Augmentation for Video Classification
Recent data augmentation strategies have been reported to address the overfitting problems in static image classifiers.
ACM-Net: Action Context Modeling Network for Weakly-Supervised Temporal Action Localization
Traditional methods mainly focus on foreground and background frames separation with only a single attention branch and class activation sequence.
W-TALC: Weakly-supervised Temporal Activity Localization and Classification
Most activity localization methods in the literature suffer from the burden of frame-wise annotation requirement.