Video Action Segmentation via Contextually Refined Temporal Keypoints

ICCV 2023  ·  Borui Jiang, Yang Jin, Zhentao Tan, Yadong Mu ·

Video action segmentation refers to the task of densely casting each video frame or short segment in an untrimmed video into some pre-specified action categories. Although recent years have witnessed a great promise in the development of action segmentation techniques.A large body of existing methods still rely on frame-wise segmentation, which tends to render fragmentary results (i.e., over-segmentation).To effectively address above issues, we here propose a video action segmentation model that implements the novel idea of Refined Temporal Keypoints (RTK) for overcoming caveats of existing methods.To act effectively, the proposed model initially seeks for high-quality, sparse temporal keypoints by extracting non-local cues from the video, rather than conducting frame-wise classification as in many competing methods.Afterwards, large improvements over the inital temporal keypoints are pin-pointed as contributions by further refining and re-assembling operations. In specific, we develop a graph matching module that aggregates structural information between different temporal keypoints by learning the corresponding relationship of the temporal source graphs and the annotated target graphs. The initial temporal keypoints are refined by the encoded structural information reusing the graph matching module.A few set of prior rules are harnessed for post-processing and re-assembling all temporal keypoints.The remaining temporal keypoiting going through all refinement are used to generate the final action segmentation results.We perform experiments on three popular datasets: 50salads, GTEA and Breakfast, and our methods significantly outperforms the current methods, particularly achieves the state-of-the-art F1@50 scores of 83.4%, 79.5%, and 60.5% on three datasets, respectively.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here