Weakly Supervised Action Localization

15 papers with code • 5 benchmarks • 2 datasets

In this task, the training data consists of videos with a list of activities in them without any temporal boundary annotations. However, while testing, given a video, the algorithm should recognize the activities in the video and also provide the start and end time.

Greatest papers with code

Background Suppression Network for Weakly-supervised Temporal Action Localization

Pilhyeon/BaSNet-pytorch 22 Nov 2019

This formulation does not fully model the problem in that background frames are forced to be misclassified as action classes to predict video-level labels accurately.

Weakly Supervised Action Localization Weakly-supervised Temporal Action Localization +1

Weakly-supervised Temporal Action Localization by Uncertainty Modeling

Pilhyeon/WTAL-Uncertainty-Modeling 12 Jun 2020

Experimental results show that our uncertainty modeling is effective at alleviating the interference of background frames and brings a large performance gain without bells and whistles.

Action Classification Multiple Instance Learning +4

Guess Where? Actor-Supervision for Spatiotemporal Action Localization

escorciav/roi_pooling 5 Apr 2018

Second, we propose an actor-based attention mechanism that enables the localization of the actions from action class labels and actor proposals and is end-to-end trainable.

Action Localization Weakly Supervised Action Localization

3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization

naraysa/3c-net ICCV 2019

Our joint formulation has three terms: a classification term to ensure the separability of learned action features, an adapted multi-label center loss term to enhance the action feature discriminability and a counting loss term to delineate adjacent action sequences, leading to improved localization.

Action Classification Weakly Supervised Action Localization +2

Adversarial Background-Aware Loss for Weakly-supervised Temporal Activity Localization

kylemin/A2CL-PT ECCV 2020

Two triplets of the feature space are considered in our approach: one triplet is used to learn discriminative features for each activity class, and the other one is used to distinguish the features where no activity occurs (i. e. background features) from activity-related features for each video.

Metric Learning Weakly Supervised Action Localization +1