Maximization and restoration: Action segmentation through dilation passing and temporal reconstruction

Action segmentation aims to split videos into segments of different actions. Recent work focuses on dealing with long-range dependencies of long, untrimmed videos, but still suffers from over-segmentation and performance saturation due to increased model complexity. This paper addresses the aforementioned issues through a divide-and-conquer strategy that first maximizes the frame-wise classification accuracy of the model and then reduces the over-segmentation errors. This strategy is implemented with the Dilation Passing and Reconstruction Network, composed of the Dilation Passing Network, which primarily aims to increase accuracy by propagating information of different dilations, and the Temporal Reconstruction Network, which reduces over-segmentation errors by temporally encoding and decoding the output features from the Dilation Passing Network. We also propose a weighted temporal mean squared error loss that further reduces over-segmentation. Through evaluations on the 50Salads, GTEA, and Breakfast datasets, we show that our model achieves significant results compared to existing state-of-the-art models.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Action Segmentation 50 Salads DPRN F1@10% 87.8 # 9
Edit 82.0 # 9
Acc 87.2 # 9
F1@25% 86.3 # 10
F1@50% 79.4 # 10
Action Segmentation Breakfast DPRN F1@10% 75.6 # 12
F1@50% 57.6 # 10
Acc 71.7 # 11
Edit 75.1 # 11
F1@25% 70.5 # 11
Action Segmentation GTEA DPRN F1@10% 92.9 # 4
F1@50% 82.9 # 5
Acc 82.0 # 4
Edit 90.9 # 5
F1@25% 92.0 # 3

Methods


No methods listed for this paper. Add relevant methods here