The semantic segmentation task is to assign a label from a label set to each pixel in an image. In the case of fully supervised setting, the dataset consists of images and their corresponding pixel-level class-specific annotations (expensive pixel-level annotations). However, in the weakly-supervised setting, the dataset consists of images and corresponding annotations that are relatively easy to obtain, such as tags/labels of objects present in the image
We present a weakly supervised model that jointly performs both semantic- and instance-segmentation -- a particularly relevant problem given the substantial cost of obtaining pixel-perfect annotation for these tasks.
Weakly supervised semantic segmentation has been a subject of increased interest due to the scarcity of fully annotated images.
Inspired by the traditional image segmentation methods of seeded region growing, we propose to train a semantic segmentation network starting from the discriminative regions and progressively increase the pixel-level supervision using by seeded region growing.
To the best of our knowledge, the method of [Pathak et al., 2015] is the only prior work that addresses deep CNNs with linear constraints in weakly supervised segmentation.
Training a Convolutional Neural Network (CNN) for semantic segmentation typically requires to collect a large amount of accurate pixel-level annotations, a hard and expensive task.
We propose an approach to discover class-specific pixels for the weakly-supervised semantic segmentation task.
At each time step, the system receives as input a video frame, predicts the optical flow based on the current observation and the LSTM memory state as a dense transformation map, and applies it to the current frame to generate the next frame.