W-TALC: Weakly-supervised Temporal Activity Localization and Classification

ECCV 2018  ·  Sujoy Paul, Sourya Roy, Amit K. Roy-Chowdhury ·

Most activity localization methods in the literature suffer from the burden of frame-wise annotation requirement. Learning from weak labels may be a potential solution towards reducing such manual labeling effort. Recent years have witnessed a substantial influx of tagged videos on the Internet, which can serve as a rich source of weakly-supervised training data. Specifically, the correlations between videos with similar tags can be utilized to temporally localize the activities. Towards this goal, we present W-TALC, a Weakly-supervised Temporal Activity Localization and Classification framework using only video-level labels. The proposed network can be divided into two sub-networks, namely the Two-Stream based feature extractor network and a weakly-supervised module, which we learn by optimizing two complimentary loss functions. Qualitative and quantitative results on two challenging datasets - Thumos14 and ActivityNet1.2, demonstrate that the proposed method is able to detect activities at a fine granularity and achieve better performance than current state-of-the-art methods.

PDF Abstract ECCV 2018 PDF ECCV 2018 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Action Classification ActivityNet-1.2 W-TALC mAP 93.2 # 1
Weakly Supervised Action Localization ActivityNet-1.2 W-TALC mAP@0.5 37.0 # 11
Action Classification THUMOS’14 W-TALC mAP 85.6 # 2
Weakly Supervised Action Localization THUMOS 2014 W-TALC mAP@0.5 22.8 # 17
mAP@0.1:0.7 - # 15


No methods listed for this paper. Add relevant methods here