Fast Weakly Supervised Action Segmentation Using Mutual Consistency

5 Apr 2019  ·  Yaser Souri, Mohsen Fayyaz, Luca Minciullo, Gianpiero Francesca, Juergen Gall ·

Action segmentation is the task of predicting the actions for each frame of a video. As obtaining the full annotation of videos for action segmentation is expensive, weakly supervised approaches that can learn only from transcripts are appealing. In this paper, we propose a novel end-to-end approach for weakly supervised action segmentation based on a two-branch neural network. The two branches of our network predict two redundant but different representations for action segmentation and we propose a novel mutual consistency (MuCon) loss that enforces the consistency of the two redundant representations. Using the MuCon loss together with a loss for transcript prediction, our proposed approach achieves the accuracy of state-of-the-art approaches while being $14$ times faster to train and $20$ times faster during inference. The MuCon loss proves beneficial even in the fully supervised setting.

PDF Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Weakly Supervised Action Segmentation (Transcript) Breakfast MuCon Acc 48.5 # 4
Action Segmentation Breakfast MuCon F1@10% 73.2 # 19
F1@50% 48.4 # 22
Acc 62.8 # 31
Edit 76.3 # 9
F1@25% 66.1 # 20

Methods