1 dataset result for segmentation AND Referring Video Object Segmentation AND Texts

There exist previous works [6, 10] that constructed referring segmentation datasets for videos. Each video has pixel-level instance segmentation annotation at every 5 frames in 30-fps videos, and their durations are around 3 to 6 seconds.

34 PAPERS • 3 BENCHMARKS

Datasets

1 dataset result for segmentation AND Referring Video Object Segmentation AND Texts