Boundary-Aware Cascade Networks for Temporal Action Segmentation

Identifying human action segments in an untrimmed video is still challenging due to boundary ambiguity and over-segmentation issues. To address these problems, we present a new boundary-aware cascade network by introducing two novel components... First, we devise a new cascading paradigm, called Stage Cascade, to enable our model to have adaptive receptive fields and more confident predictions for ambiguous frames. Second, we design a general and principled smoothing operation, termed as local barrier pooling, to aggregate local predictions by leveraging semantic boundary information. Moreover, these two components can be jointly fine-tuned in an end-to-end manner. We perform experiments on three challenging datasets: 50Salads, GTEA and Breakfast dataset, demonstrating that our framework significantly out-performs the current state-of-the-art methods. The code is available at https://github.com/MCG-NJU/BCN. read more

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Action Segmentation 50 Salads BCN F1@10% 82.3 # 8
Edit 74.3 # 9
Acc 84.4 # 5
F1@25% 81.3 # 8
F1@50% 74 # 6
Action Segmentation Breakfast BCN F1@10% 68.7 # 11
F1@50% 55.0 # 8
Acc 70.4 # 5
Edit 66.2 # 12
F1@25% 65.5 # 11
Action Segmentation GTEA BCN F1@10% 88.5 # 10
F1@50% 77.3 # 6
Acc 79.8 # 4
Edit 84.4 # 8
F1@25% 87.1 # 9

Methods


No methods listed for this paper. Add relevant methods here