Weakly Supervised Action Localization
32 papers with code • 8 benchmarks • 5 datasets
In this task, the training data consists of videos with a list of activities in them without any temporal boundary annotations. However, while testing, given a video, the algorithm should recognize the activities in the video and also provide the start and end time.
Benchmarks
These leaderboards are used to track progress in Weakly Supervised Action Localization
Libraries
Use these libraries to find Weakly Supervised Action Localization models and implementationsMost implemented papers
W-TALC: Weakly-supervised Temporal Activity Localization and Classification
Most activity localization methods in the literature suffer from the burden of frame-wise annotation requirement.
RefineLoc: Iterative Refinement for Weakly-Supervised Action Localization
RefineLoc shows competitive results with the state-of-the-art in weakly-supervised temporal localization.
Completeness Modeling and Context Separation for Weakly Supervised Temporal Action Localization
In this work, we first identify two underexplored problems posed by the weak supervision for temporal action localization, namely action completeness modeling and action-context separation.
3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization
Our joint formulation has three terms: a classification term to ensure the separability of learned action features, an adapted multi-label center loss term to enhance the action feature discriminability and a counting loss term to delineate adjacent action sequences, leading to improved localization.
SF-Net: Single-Frame Supervision for Temporal Action Localization
To obtain the single-frame supervision, the annotators are asked to identify only a single frame within the temporal window of an action.
Weakly-Supervised Action Localization by Generative Attention Modeling
By maximizing the conditional probability with respect to the attention, the action and non-action frames are well separated.
Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning
Weakly-supervised action localization requires training a model to localize the action segments in the video given only video level action label.
Adversarial Background-Aware Loss for Weakly-supervised Temporal Activity Localization
Two triplets of the feature space are considered in our approach: one triplet is used to learn discriminative features for each activity class, and the other one is used to distinguish the features where no activity occurs (i. e. background features) from activity-related features for each video.
D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations
The proposed formulation comprises a discriminative and a denoising loss term for enhancing temporal action localization.
Temporal Action Segmentation from Timestamp Supervision
To demonstrate the effectiveness of timestamp supervision, we propose an approach to train a segmentation model using only timestamps annotations.