Mask R-CNN

Last updated on Feb 12, 2021

Mask R-CNN ResNet-50 FPN

Parameters 44 Million
FLOPs 447 Billion
File Size 169.84 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Training Techniques Weight Decay, SGD with Momentum
Architecture RPN, RoIAlign, FPN, Feedforward Network, 1x1 Convolution, Bottleneck Residual Block, Batch Normalization, Convolution, Global Average Pooling, Residual Block, Residual Connection, ReLU, Max Pooling, Softmax, Non Maximum Suppression
ID maskrcnn_resnet50_fpn
LR 0.02
Epochs 26
LR Steps 16, 22
Momentum 0.9
Batch Size 2
Memory (GB) 5.4
LR Step Size 8
Weight Decay 0.0001
train time (s/im) 0.2728
inference time (s/im) 0.0903
Aspect Ratio Group Factor 3
SHOW MORE
SHOW LESS
README.md

Summary

Mask R-CNN extends Faster R-CNN to solve instance segmentation tasks. It achieves this by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. In principle, Mask R-CNN is an intuitive extension of Faster R-CNN, but constructing the mask branch properly is critical for good results.

Most importantly, Faster R-CNN was not designed for pixel-to-pixel alignment between network inputs and outputs. This is evident in how RoIPool, the de facto core operation for attending to instances, performs coarse spatial quantization for feature extraction. To fix the misalignment, Mask R-CNN utilises a simple, quantization-free layer, called RoIAlign, that faithfully preserves exact spatial locations.

How do I load this model?

To load a pretrained model:

import torchvision.models as models
maskrcnn_resnet50_fpn = models.detection.maskrcnn_resnet50_fpn(pretrained=True)

Replace the model name with the variant you want to use, e.g. maskrcnn_resnet50_fpn. You can find the IDs in the model summaries at the top of this page.

To evaluate the model, use the object detection recipes from the library.

How do I train this model?

You can follow the torchvision recipe on GitHub for training a new model afresh.

Citation

@article{DBLP:journals/corr/HeGDG17,
  author    = {Kaiming He and
               Georgia Gkioxari and
               Piotr Doll{\'{a}}r and
               Ross B. Girshick},
  title     = {Mask {R-CNN}},
  journal   = {CoRR},
  volume    = {abs/1703.06870},
  year      = {2017},
  url       = {http://arxiv.org/abs/1703.06870},
  archivePrefix = {arXiv},
  eprint    = {1703.06870},
  timestamp = {Mon, 13 Aug 2018 16:46:36 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/HeGDG17.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Results

Object Detection on COCO minival

Object Detection
BENCHMARK MODEL METRIC NAME METRIC VALUE GLOBAL RANK
COCO minival Mask R-CNN ResNet-50 FPN box AP 37.9 # 95
Instance Segmentation
BENCHMARK MODEL METRIC NAME METRIC VALUE GLOBAL RANK
COCO minival Mask R-CNN ResNet-50 FPN mask AP 34.6 # 52