Deep-MAC Explained | Papers With Code

Method Name:*

Method Full Name:*

Description with Markdown (optional):

**Deep-MAC**, or **Deep Mask-heads Above CenterNet**, is a type of anchor-free instance segmentation model based on [CenterNet](https://paperswithcode.com/method/centernet).  The motivation for this new architecture is that boxes are much cheaper to annotate than masks, so the authors address the “partially supervised” instance segmentation problem, where all classes have bounding box annotations but only a subset of classes have mask annotations.

For predicting bounding boxes, CenterNet outputs 3 tensors: (1) a class-specific [heatmap](https://paperswithcode.com/method/heatmap) which indicates the probability of the center of a bounding box being present at each location, (2) a class-agnostic 2-channel tensor indicating the height and width of the bounding box at each center pixel, and (3) since the output feature map is typically smaller than the image (stride 4 or 8), CenterNet also predicts an x and y direction offset to recover this discretization error at each center pixel.

For Deep-MAC, in parallel to the box-related prediction heads, we add a fourth pixel embedding branch $P$. For each bounding box
$b$, we crop a region $P\_{b}$ from $P$ corresponding to $b$ via [ROIAlign](https://paperswithcode.com/method/roi-align) which results in a 32 × 32 tensor. We then feed each $P\_{b}$ to a mask-head. The final prediction at the end is a class-agnostic 32 × 32 tensor which we pass through a sigmoid to get per-pixel probabilities. We train this mask-head via a per-pixel cross-entropy loss averaged over all pixels and instances. During post-processing, the predicted mask is re-aligned according to the predicted box and resized to the resolution of the image.

In addition to this 32 × 32 cropped feature map, we add two inputs for improved stability of some mask-heads: (1) Instance embedding: an additional head is added to the backbone that predicts a per-pixel embedding. For each bounding box $b$ we extract its embedding from the center pixel. This embedding is tiled to a size of 32 × 32 and concatenated to the pixel embedding crop. This helps condition the mask-head on a particular instance and disambiguate it from others. (2) Coordinate Embedding: Inspired by [CoordConv](https://paperswithcode.com/method/coordconv), the authors add a 32 × 32 × 2 tensor holding normalized $\left(x, y\right)$ coordinates relative to the bounding box $b$.

Code Snippet URL (optional):

Image

Currently: methods/Screen_Shot_2021-09-16_at_3.17.09_PM.png Clear
Change:

Attached collections:

INSTANCE SEGMENTATION MODELS

Add:

New collection name:

Top-level area:

Parent collection (if any):

Description (optional):

Task	Papers	Share
Instance Segmentation	1	50.00%
Semantic Segmentation	1	50.00%

Deep-MAC

Papers

Tasks

Usage Over Time

Components

Categories

Add Remove