K-Net

Introduced by Zhang et al. in K-Net: Towards Unified Image Segmentation

K-Net is a framework for unified semantic and instance segmentation that segments both instances and semantic categories consistently by a group of learnable kernels, where each kernel is responsible for generating a mask for either a potential instance or a stuff class. It begins with a set of kernels that are randomly initialized, and learns the kernels in accordance to the segmentation targets at hand, namely, semantic kernels for semantic categories and instance kernels for instance identities. A simple combination of semantic kernels and instance kernels allows panoptic segmentation naturally. In the forward pass, the kernels perform convolution on the image features to obtain the corresponding segmentation predictions.

K-Net is formulated so that it dynamically updates the kernels to make them conditional to their activations on the image. Such a content-aware mechanism is crucial to ensure that each kernel, especially an instance kernel, responds accurately to varying objects in an image. Through applying this adaptive kernel update strategy iteratively, K-Net significantly improves the discriminative ability of the kernels and boosts the final segmentation performance. It is noteworthy that this strategy universally applies to kernels for all the segmentation tasks.

It also utilises a bipartite matching strategy to assign learning targets for each kernel. This training approach is advantageous to conventional training strategies as it builds a one-to-one mapping between kernels and instances in an image. It thus resolves the problem of dealing with a varying number of instances in an image. In addition, it is purely mask-driven without involving boxes. Hence, K-Net is naturally NMS-free and box-free, which is appealing to real-time applications.

Source: K-Net: Towards Unified Image Segmentation

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Instance Segmentation	3	15.00%
Panoptic Segmentation	3	15.00%
Semantic Segmentation	3	15.00%
Video Instance Segmentation	2	10.00%
Video Panoptic Segmentation	2	10.00%
Video Semantic Segmentation	2	10.00%
Image Segmentation	2	10.00%
Scene Parsing	1	5.00%
Decoder	1	5.00%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Semantic Segmentation Models

Instance Segmentation Models