Weakly Supervised Temporal Action Localization
23 papers with code • 1 benchmarks • 2 datasets
LibrariesUse these libraries to find Weakly Supervised Temporal Action Localization models and implementations
We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks.
This formulation does not fully model the problem in that background frames are forced to be misclassified as action classes to predict video-level labels accurately.
Experimental results show that our uncertainty modeling is effective at alleviating the interference of background frames and brings a large performance gain without bells and whistles.
Traditional methods mainly focus on foreground and background frames separation with only a single attention branch and class activation sequence.
In this paper, we first develop a novel weakly-supervised TAL framework called AutoLoc to directly predict the temporal boundary of each action instance.
In this work, we first identify two underexplored problems posed by the weak supervision for temporal action localization, namely action completeness modeling and action-context separation.
Our joint formulation has three terms: a classification term to ensure the separability of learned action features, an adapted multi-label center loss term to enhance the action feature discriminability and a counting loss term to delineate adjacent action sequences, leading to improved localization.
We propose a classification module to generate action labels for each segment in the video, and a deep metric learning module to learn the similarity between different action instances.
By maximizing the conditional probability with respect to the attention, the action and non-action frames are well separated.