UVO is a new benchmark for open-world class-agnostic object segmentation in videos. Besides shifting the problem focus to the open-world setup, UVO is significantly larger, providing approximately 8 times more videos compared with DAVIS, and 7 times more mask (instance) annotations per video compared with YouTube-VOS and YouTube-VIS. UVO is also more challenging as it includes many videos with crowded scenes and complex background motions. Some highlights of the dataset include:
High quality instance masks densely annotated at 30 fps on 1024 YouTube videos and 1fps on 10337 videos from Kinetics dataset
Open-world: annotating all objects in each video, 13.5 objects per video on average
Diverse object categories: 57% of objects are not covered by COCO categories
Paper | Code | Results | Date | Stars |
---|