While Video Instance Segmentation (VIS) has seen rapid progress, current approaches struggle to predict high-quality masks with accurate boundary details. To tackle this issue, we identify that the coarse boundary annotations of the popular YouTube-VIS dataset constitute a major limiting factor. To benchmark high-quality mask predictions for VIS, we introduce the HQ-YTVIS dataset as well as Tube-Boundary AP in ECCV 2022. HQ-YTVIS consists of a manually re-annotated test set and our automatically refined training data, which provides training, validation and testing support to facilitate future development of VIS methods aiming at higher mask quality.
Paper | Code | Results | Date | Stars |
---|