Temporal Action Localization with weak supervision where only video-level labels are given for training

# Completeness Modeling and Context Separation for Weakly Supervised Temporal Action Localization

In this work, we first identify two underexplored problems posed by the weak supervision for temporal action localization, namely action completeness modeling and action-context separation.

# Background Suppression Network for Weakly-supervised Temporal Action Localization

This formulation does not fully model the problem in that background frames are forced to be misclassified as action classes to predict video-level labels accurately.

# AutoLoc: Weakly-supervised Temporal Action Localization

In this paper, we first develop a novel weakly-supervised TAL framework called AutoLoc to directly predict the temporal boundary of each action instance.

# Weakly-Supervised Action Localization by Generative Attention Modeling

By maximizing the conditional probability with respect to the attention, the action and non-action frames are well separated.

# 3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization

Our joint formulation has three terms: a classification term to ensure the separability of learned action features, an adapted multi-label center loss term to enhance the action feature discriminability and a counting loss term to delineate adjacent action sequences, leading to improved localization.

# Weakly Supervised Action Localization by Sparse Temporal Pooling Network

We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks.

