Temporal Action Localization with weak supervision where only video-level labels are given for training
In this work, we first identify two underexplored problems posed by the weak supervision for temporal action localization, namely action completeness modeling and action-context separation.
#2 best model for Weakly Supervised Action Localization on ActivityNet-1.3
In this paper, we first develop a novel weakly-supervised TAL framework called AutoLoc to directly predict the temporal boundary of each action instance.
This formulation does not fully model the problem in that background frames are forced to be misclassified as action classes to predict video-level labels accurately.
Our joint formulation has three terms: a classification term to ensure the separability of learned action features, an adapted multi-label center loss term to enhance the action feature discriminability and a counting loss term to delineate adjacent action sequences, leading to improved localization.
We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks.
#5 best model for Weakly Supervised Action Localization on ActivityNet-1.3
We propose a classification module to generate action labels for each segment in the video, and a deep metric learning module to learn the similarity between different action instances.