Actor-Action Semantic Segmentation with Grouping Process Models

CVPR 2016  ·  Chenliang Xu, Jason J. Corso ·

Actor-action semantic segmentation made an important step toward advanced video understanding problems: what action is happening; who is performing the action; and where is the action in space-time. Current models for this problem are local, based on layered CRFs, and are unable to capture long-ranging interaction of video parts. We propose a new model that combines these local labeling CRFs with a hierarchical supervoxel decomposition. The supervoxels provide cues for possible groupings of nodes, at various scales, in the CRFs to encourage adaptive, high-order groups for more effective labeling. Our model is dynamic and continuously exchanges information during inference: the local CRFs influence what supervoxels in the hierarchy are active, and these active nodes influence the connectivity in the CRF; we hence call it a grouping process model. The experimental results on a recent large-scale video dataset show a large margin of 60% relative improvement over the state of the art, which demonstrates the effectiveness of the dynamic, bidirectional flow between labeling and grouping.

PDF Abstract CVPR 2016 PDF CVPR 2016 Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here