Modular Interactive VOS

Introduced by Cheng et al. in Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

MiVOS is a video object segmentation model which decouples interaction-to-mask and mask propagation. By decoupling interaction from propagation, MiVOS is versatile and not limited by the type of interactions. It uses three modules: Interaction-to-Mask, Propagation and Difference-Aware Fusion. Trained separately, the interaction module converts user interactions to an object mask, which is then temporally propagated by our propagation module using a novel top-filtering strategy in reading the space-time memory. To effectively take the user's intent into account, a novel difference-aware module is proposed to learn how to properly fuse the masks before and after each interaction, which are aligned with the target frames by employing the space-time memory.

Source: Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Interactive Video Object Segmentation	1	20.00%
Semantic Segmentation	1	20.00%
Semi-Supervised Video Object Segmentation	1	20.00%
Video Object Segmentation	1	20.00%
Video Semantic Segmentation	1	20.00%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Video Object Segmentation Models