Context-Aware Video Instance Segmentation

3 Jul 2024  ยท  Seunghun Lee, Jiwan Seo, Kiljoon Han, Minwoo Choi, Sunghoon Im ยท

In this paper, we introduce the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. To efficiently extract and leverage this information, we propose the Context-Aware Instance Tracker (CAIT), which merges contextual data surrounding the instances with the core instance features to improve tracking accuracy. Additionally, we introduce the Prototypical Cross-frame Contrastive (PCC) loss, which ensures consistency in object-level features across frames, thereby significantly enhancing instance matching accuracy. CAVIS demonstrates superior performance over state-of-the-art methods on all benchmark datasets in video instance segmentation (VIS) and video panoptic segmentation (VPS). Notably, our method excels on the OVIS dataset, which is known for its particularly challenging videos.

PDF Abstract

Results from the Paper


 Ranked #1 on Video Instance Segmentation on OVIS validation (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Video Instance Segmentation OVIS validation CAVIS(VIT-L, Offline) mask AP 57.1 # 1
AP50 82.6 # 2
AP75 63.5 # 1
AR1 21.2 # 1
AR10 61.8 # 1
Video Panoptic Segmentation VIPSeg CAVIS(VIT-L) VPQ 58.5 # 1
STQ 56.1 # 2
Video Instance Segmentation YouTube-VIS 2021 CAVIS(VIT-L, Offline) mask AP 65.3 # 1
AP50 87.3 # 1
AP75 73.2 # 1
AR10 70.3 # 2
AR1 49.7 # 1
Video Instance Segmentation Youtube-VIS 2022 Validation CAVIS (VIT-L) mAP_L 48.6 # 2
Video Instance Segmentation YouTube-VIS validation CAVIS(VIT-L, Offline) mask AP 69.4 # 1
AP50 90.9 # 1
AP75 77.2 # 1
AR1 58.3 # 2
AR10 74.7 # 2

Methods


No methods listed for this paper. Add relevant methods here