16 papers with code • 0 benchmarks • 0 datasets
This formulation does not fully model the problem in that background frames are forced to be misclassified as action classes to predict video-level labels accurately.
In this work, we first identify two underexplored problems posed by the weak supervision for temporal action localization, namely action completeness modeling and action-context separation.
Experimental results show that our uncertainty modeling is effective at alleviating the interference of background frames and brings a large performance gain without bells and whistles.
In this paper, we first develop a novel weakly-supervised TAL framework called AutoLoc to directly predict the temporal boundary of each action instance.
Our joint formulation has three terms: a classification term to ensure the separability of learned action features, an adapted multi-label center loss term to enhance the action feature discriminability and a counting loss term to delineate adjacent action sequences, leading to improved localization.
Ranked #1 on Action Classification on THUMOS’14
To learn completeness from the obtained sequence, we introduce two novel losses that contrast action instances with background ones in terms of action score and feature similarity, respectively.
Ranked #1 on Weakly Supervised Action Localization on THUMOS 2014
We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks.
We propose a classification module to generate action labels for each segment in the video, and a deep metric learning module to learn the similarity between different action instances.
Ranked #1 on Temporal Action Localization on ActivityNet-1.2
Traditional methods mainly focus on foreground and background frames separation with only a single attention branch and class activation sequence.