Actor and Action Video Segmentation from a Sentence

This paper strives for pixel-level segmentation of actors and their actions in video content. Different from existing works, which all learn to segment from a fixed vocabulary of actor and action pairs, we infer the segmentation from a natural language input sentence. This allows to distinguish between fine-grained actors in the same super-category, identify actor and action instances, and segment pairs that are outside of the actor and action vocabulary. We propose a fully-convolutional model for pixel-level actor and action segmentation using an encoder-decoder architecture optimized for video. To show the potential of actor and action video segmentation from a sentence, we extend two popular actor and action datasets with more than 7,500 natural language descriptions. Experiments demonstrate the quality of the sentence-guided segmentations, the generalization ability of our model, and its advantage for traditional actor and action segmentation compared to the state-of-the-art.

PDF Abstract CVPR 2018 PDF CVPR 2018 Abstract

Datasets


Introduced in the Paper:

A2D Sentences

Used in the Paper:

JHMDB A2D
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Referring Expression Segmentation A2D Sentences Gavriluyk el al. Precision@0.5 0.475 # 19
Precision@0.9 0.002 # 19
IoU overall 0.536 # 19
IoU mean 0.421 # 19
Precision@0.6 0.347 # 18
Precision@0.7 0.211 # 18
Precision@0.8 0.08 # 18
AP 0.198 # 14
Referring Expression Segmentation A2D Sentences Gavriluyk el al. (Optical flow) Precision@0.5 0.5 # 16
Precision@0.9 0.004 # 18
IoU overall 0.551 # 18
IoU mean 0.426 # 18
Precision@0.6 0.376 # 17
Precision@0.7 0.231 # 17
Precision@0.8 0.094 # 17
AP 0.215 # 13
Referring Expression Segmentation J-HMDB Gavrilyuk et al. (Optical flow) Precision@0.5 0.712 # 13
Precision@0.6 0.518 # 14
Precision@0.7 0.264 # 15
Precision@0.8 0.030 # 16
Precision@0.9 0.000 # 8
AP 0.267 # 10
IoU overall 0.555 # 12
IoU mean 0.570 # 12
Referring Expression Segmentation J-HMDB Gavrilyuk et al. Precision@0.5 0.699 # 14
Precision@0.6 0.460 # 16
Precision@0.7 0.173 # 16
Precision@0.8 0.014 # 17
Precision@0.9 0.000 # 8
AP 0.233 # 12
IoU overall 0.541 # 15
IoU mean 0.542 # 15

Methods


No methods listed for this paper. Add relevant methods here