Boundary-Aware Cascade Networks for Temporal Action Segmentation

Identifying human action segments in an untrimmed video is still challenging due to boundary ambiguity and over-segmentation issues. To address these problems, we present a new boundary-aware cascade network by introducing two novel components... First, we devise a new cascading paradigm, called Stage Cascade, to enable our model to have adaptive receptive fields and more confident predictions for ambiguous frames. Second, we design a general and principled smoothing operation, termed as local barrier pooling, to aggregate local predictions by leveraging semantic boundary information. Moreover, these two components can be jointly fine-tuned in an end-to-end manner. We perform experiments on three challenging datasets: 50Salads, GTEA and Breakfast dataset, demonstrating that our framework significantly out-performs the current state-of-the-art methods. The code is available at read more

PDF Abstract

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Action Segmentation 50 Salads BCN F1@10% 82.3 # 8
Edit 74.3 # 9
Acc 84.4 # 5
F1@25% 81.3 # 8
F1@50% 74 # 6
Action Segmentation Breakfast BCN F1@10% 68.7 # 11
F1@50% 55.0 # 8
Acc 70.4 # 5
Edit 66.2 # 12
F1@25% 65.5 # 11
Action Segmentation GTEA BCN F1@10% 88.5 # 10
F1@50% 77.3 # 6
Acc 79.8 # 4
Edit 84.4 # 8
F1@25% 87.1 # 9


No methods listed for this paper. Add relevant methods here