Paper tables with annotated results for Improving Semantic Segmentation through Spatio-Temporal Consistency Learned from Videos

Paper

Improving Semantic Segmentation through Spatio-Temporal Consistency Learned from Videos

We leverage unsupervised learning of depth, egomotion, and camera intrinsics to improve the performance of single-image semantic segmentation, by enforcing 3D-geometric and temporal consistency of segmentation masks across video frames. The predicted depth, egomotion, and camera intrinsics are used to provide an additional supervision signal to the segmentation model, significantly enhancing its quality, or, alternatively, reducing the number of labels the segmentation model needs. Our experiments were performed on the ScanNet dataset.

PDF Paper record

Results in Papers With Code

(↓ scroll down to see all results)

Improving Semantic Segmentation through Spatio-Temporal Consistency Learned from Videos

Reader Guidelines

Editor Guidelines