Instance Segmentation
839 papers with code • 21 benchmarks • 74 datasets
Instance Segmentation is a computer vision task that involves identifying and separating individual objects within an image, including detecting the boundaries of each object and assigning a unique label to each object. The goal of instance segmentation is to produce a pixel-wise segmentation map of the image, where each pixel is assigned to a specific object instance.
Image Credit: Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers, CVPR'21
Libraries
Use these libraries to find Instance Segmentation models and implementationsSubtasks
-
Referring Expression Segmentation
-
3D Instance Segmentation
-
Real-time Instance Segmentation
-
Unsupervised Object Segmentation
-
Unsupervised Object Segmentation
-
Box-supervised Instance Segmentation
-
Amodal Instance Segmentation
-
Image-level Supervised Instance Segmentation
-
Unseen Object Instance Segmentation
-
3D Semantic Instance Segmentation
-
Human Instance Segmentation
-
One-Shot Instance Segmentation
-
Semi-Supervised Person Instance Segmentation
-
Point-Supervised Instance Segmentation
-
Solar Cell Segmentation
Most implemented papers
Mask R-CNN
Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance.
MMDetection: Open MMLab Detection Toolbox and Benchmark
In this paper, we introduce the various features of this toolbox.
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision.
YOLACT: Real-time Instance Segmentation
Then we produce instance masks by linearly combining the prototypes with the mask coefficients.
Deep High-Resolution Representation Learning for Visual Recognition
High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection.
Deep High-Resolution Representation Learning for Human Pose Estimation
We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel.
YOLACT++: Better Real-time Instance Segmentation
Then we produce instance masks by linearly combining the prototypes with the mask coefficients.
Microsoft COCO: Common Objects in Context
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding.
ResNeSt: Split-Attention Networks
It is well known that featuremap attention and multi-path representation are important for visual recognition.
Non-local Neural Networks
Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time.