Weakly-supervised Temporal Action Localization
32 papers with code • 2 benchmarks • 2 datasets
Temporal Action Localization with weak supervision where only video-level labels are given for training
Libraries
Use these libraries to find Weakly-supervised Temporal Action Localization models and implementationsMost implemented papers
Weakly-Supervised Action Localization by Generative Attention Modeling
By maximizing the conditional probability with respect to the attention, the action and non-action frames are well separated.
Adversarial Background-Aware Loss for Weakly-supervised Temporal Activity Localization
Two triplets of the feature space are considered in our approach: one triplet is used to learn discriminative features for each activity class, and the other one is used to distinguish the features where no activity occurs (i. e. background features) from activity-related features for each video.
D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations
The proposed formulation comprises a discriminative and a denoising loss term for enhancing temporal action localization.
A Hybrid Attention Mechanism for Weakly-Supervised Temporal Action Localization
Moreover, our temporal semi-soft and hard attention modules, calculating two attention scores for each video snippet, help to focus on the less discriminative frames of an action to capture the full action boundary.
The Blessings of Unlabeled Background in Untrimmed Videos
The key challenge is how to distinguish the action of interest segments from the background, which is unlabelled even on the video-level.
CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning
In this paper, we argue that learning by comparing helps identify these hard snippets and we propose to utilize snippet Contrastive learning to Localize Actions, CoLA for short.
Cross-modal Consensus Network forWeakly Supervised Temporal Action Localization
In this work, we argue that the features extracted from the pretrained extractor, e. g., I3D, are not the WS-TALtask-specific features, thus the feature re-calibration is needed for reducing the task-irrelevant information redundancy.
Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization
To learn completeness from the obtained sequence, we introduce two novel losses that contrast action instances with background ones in terms of action score and feature similarity, respectively.
Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization
In this paper, we present a framework named FAC-Net based on the I3D backbone, on which three branches are appended, named class-wise foreground classification branch, class-agnostic attention branch and multiple instance learning branch.
Background-Click Supervision for Temporal Action Localization
Weakly supervised temporal action localization aims at learning the instance-level action pattern from the video-level labels, where a significant challenge is action-context confusion.