Temporal Action Localization with weak supervision where only video-level labels are given for training
In this work, we first identify two underexplored problems posed by the weak supervision for temporal action localization, namely action completeness modeling and action-context separation.
#2 best model for Weakly Supervised Action Localization on ActivityNet-1.3
This formulation does not fully model the problem in that background frames are forced to be misclassified as action classes to predict video-level labels accurately.
In this paper, we first develop a novel weakly-supervised TAL framework called AutoLoc to directly predict the temporal boundary of each action instance.
By maximizing the conditional probability with respect to the attention, the action and non-action frames are well separated.
Our joint formulation has three terms: a classification term to ensure the separability of learned action features, an adapted multi-label center loss term to enhance the action feature discriminability and a counting loss term to delineate adjacent action sequences, leading to improved localization.
SOTA for Action Classification on THUMOS’14
We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks.
#5 best model for Weakly Supervised Action Localization on ActivityNet-1.3
We propose a classification module to generate action labels for each segment in the video, and a deep metric learning module to learn the similarity between different action instances.