In this task, the training data consists of videos with a list of activities in them without any temporal boundary annotations. However, while testing, given a video, the algorithm should recognize the activities in the video and also provide the start and end time.
|Trend||Dataset||Best Method||Paper title||Paper||Code||Compare|
We exploit the learned models for action recognition (WSR) and detection (WSD) on the untrimmed video datasets of THUMOS14 and ActivityNet.
#5 best model for Weakly Supervised Action Localization on THUMOS 2014
We propose `Hide-and-Seek', a weakly-supervised framework that aims to improve object localization in images and action localization in videos.
Most activity localization methods in the literature suffer from the burden of frame-wise annotation requirement.
#2 best model for Weakly Supervised Action Localization on THUMOS 2014
In this work, we first identify two underexplored problems posed by the weak supervision for temporal action localization, namely action completeness modeling and action-context separation.
Second, we propose an actor-based attention mechanism that enables the localization of the actions from action class labels and actor proposals and is end-to-end trainable.
We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks.
#4 best model for Weakly Supervised Action Localization on THUMOS 2014
Our joint formulation has three terms: a classification term to ensure the separability of learned action features, an adapted multi-label center loss term to enhance the action feature discriminability and a counting loss term to delineate adjacent action sequences, leading to improved localization.