Semantic segmentation, or image segmentation, is the task of clustering parts of an image together which belong to the same object class. It is a form of pixel-level prediction because each pixel in an image is classified according to a category. Some example benchmarks for this task are Cityscapes, PASCAL VOC and ADE20K. Models are usually evaluated with the Mean Intersection-Over-Union (Mean IoU) and Pixel Accuracy metrics.

The surprising impact of mask-head architecture on novel class segmentation

Within this family, we show that the architecture of the mask-head plays a surprisingly important role in generalization to classes for which we do not observe masks during training.

Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation

We view this work as a notable step towards building a simple procedure to harness unlabeled video sequences and extra images to surpass state-of-the-art performance on core computer vision tasks.

Searching for MobileNetV3

We achieve new state of the art results for mobile classification, detection and segmentation.

FEELVOS: Fast End-to-End Embedding Learning for Video Object Segmentation

Many of the recent successful methods for video object segmentation (VOS) are overly complicated, heavily rely on fine-tuning on the first frame, and/or are slow, and are hence of limited practical use.

Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation

Therefore, we propose to search the network level structure in addition to the cell level structure, which forms a hierarchical architecture search space.

Searching for Efficient Multi-Scale Architectures for Dense Image Prediction

Recent progress has demonstrated that such meta-learning methods may exceed scalable human-invented architectures on image classification tasks.

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information.

MobileNetV2: Inverted Residuals and Linear Bottlenecks

In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes.

Rethinking Atrous Convolution for Semantic Image Segmentation

To handle the problem of segmenting objects at multiple scales, we design modules which employ atrous convolution in cascade or in parallel to capture multi-scale context by adopting multiple atrous rates.

Mask R-CNN

Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance.

