Mask R-CNN

Model Name:*

Description with Markdown (optional):

# Summary

**Mask R-CNN** extends [Faster R-CNN](http://paperswithcode.com/method/faster-r-cnn) to solve instance segmentation tasks. It achieves this by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. In principle, Mask R-CNN is an intuitive extension of Faster R-CNN, but constructing the mask branch properly is critical for good results.

Most importantly, Faster R-CNN was not designed for pixel-to-pixel alignment between network inputs and outputs. This is evident in how [RoIPool](http://paperswithcode.com/method/roi-pooling), the *de facto* core operation for attending to instances, performs coarse spatial quantization for feature extraction. To fix the misalignment, Mask R-CNN utilises a simple, quantization-free layer, called [RoIAlign](http://paperswithcode.com/method/roi-align), that faithfully preserves exact spatial locations.

## How do I load this model?

To load a pretrained model:

```python
import torchvision.models as models
maskrcnn_resnet50_fpn = models.detection.maskrcnn_resnet50_fpn(pretrained=True)
```

Replace the model name with the variant you want to use, e.g. `maskrcnn_resnet50_fpn`. You can find 
the IDs in the model summaries at the top of this page.

To evaluate the model, use the [object detection recipes](https://github.com/pytorch/vision/tree/master/references/detection) from the library.

## How do I train this model?

You can follow the [torchvision recipe](https://github.com/pytorch/vision/tree/master/references/detection) on GitHub for training a new model afresh.

## Citation

```BibTeX
@article{DBLP:journals/corr/HeGDG17,
  author    = {Kaiming He and
               Georgia Gkioxari and
               Piotr Doll{\'{a}}r and
               Ross B. Girshick},
  title     = {Mask {R-CNN}},
  journal   = {CoRR},
  volume    = {abs/1703.06870},
  year      = {2017},
  url       = {http://arxiv.org/abs/1703.06870},
  archivePrefix = {arXiv},
  eprint    = {1703.06870},
  timestamp = {Mon, 13 Aug 2018 16:46:36 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/HeGDG17.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}
```

Paper:*

Code URL (optional):

ID	maskrcnn_resnet50_fpn
LR	0.02
Epochs	26
LR Steps	16, 22
Momentum	0.9
Batch Size	2
Memory (GB)	5.4
LR Step Size	8
Weight Decay	0.0001
train time (s/im)	0.2728
inference time (s/im)	0.0903
Aspect Ratio Group Factor	3

Attached motifs:

BATCH NORMALIZATION

FPN

RPN

GLOBAL AVERAGE POOLING

RESIDUAL CONNECTION

BOTTLENECK RESIDUAL BLOCK

FEEDFORWARD NETWORK

NON MAXIMUM SUPPRESSION

ROIALIGN

SOFTMAX

RESIDUAL BLOCK

CONVOLUTION

MAX POOLING

1X1 CONVOLUTION

RELU

BATCH NORMALIZATION

pytorch / vision

Summary

How do I load this model?

How do I train this model?

Citation

Results

Object Detection on COCO minival

Object Detection

Instance Segmentation

Training Techniques	Weight Decay, SGD with Momentum
Architecture	RPN, RoIAlign, FPN, Feedforward Network, 1x1 Convolution, Bottleneck Residual Block, Batch Normalization, Convolution, Global Average Pooling, Residual Block, Residual Connection, ReLU, Max Pooling, Softmax, Non Maximum Suppression
ID	maskrcnn_resnet50_fpn
LR	0.02
Epochs	26
LR Steps	16, 22
Momentum	0.9
Batch Size	2
Memory (GB)	5.4
LR Step Size	8
Weight Decay	0.0001
train time (s/im)	0.2728
inference time (s/im)	0.0903
Aspect Ratio Group Factor	3
SHOW MORE
SHOW LESS