Mask R-CNN

Last updated on Feb 23, 2021

Mask R-CNN (R-101-FPN, 1x, caffe)

lr sched 1x
Backbone Layers 101
File Size 242.32 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, Convolution, Dense Connections, FPN, ResNet, RoIAlign
lr sched 1x
Backbone Layers 101
SHOW MORE
SHOW LESS
Mask R-CNN (R-101-FPN, 1x, pytorch)

Memory (M) 6400.0
inference time (s/im) 0.07407
File Size 242.32 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, Convolution, Dense Connections, FPN, ResNet, RoIAlign
lr sched 1x
Memory (M) 6400.0
Backbone Layers 101
inference time (s/im) 0.07407
SHOW MORE
SHOW LESS
Mask R-CNN (R-101-FPN, 2x, pytorch)

lr sched 2x
Backbone Layers 101
File Size 242.32 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, Convolution, Dense Connections, FPN, ResNet, RoIAlign
lr sched 2x
Backbone Layers 101
SHOW MORE
SHOW LESS
Mask R-CNN (R-50-FPN, 1x, caffe)

Memory (M) 4300.0
Backbone Layers 50
File Size 169.63 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, Convolution, Dense Connections, FPN, ResNet, RoIAlign
lr sched 1x
Memory (M) 4300.0
Backbone Layers 50
SHOW MORE
SHOW LESS
Mask R-CNN (R-50-FPN, 1x, pytorch)

Memory (M) 4400.0
inference time (s/im) 0.06211
File Size 169.62 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, Convolution, Dense Connections, FPN, ResNet, RoIAlign
lr sched 1x
Memory (M) 4400.0
Backbone Layers 50
inference time (s/im) 0.06211
SHOW MORE
SHOW LESS
Mask R-CNN (R-50-FPN, 2x, caffe, multiscale)

Memory (M) 4300.0
Backbone Layers 50
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, Convolution, Dense Connections, FPN, ResNet, RoIAlign
lr sched 2x
Memory (M) 4300.0
Backbone Layers 50
SHOW MORE
SHOW LESS
Mask R-CNN (R-50-FPN, 2x, pytorch)

lr sched 2x
Backbone Layers 50
File Size 169.63 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, Convolution, Dense Connections, FPN, ResNet, RoIAlign
lr sched 2x
Backbone Layers 50
SHOW MORE
SHOW LESS
Mask R-CNN (R-50-FPN, 3x, caffe, multiscale)

Memory (M) 4300.0
Backbone Layers 50
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, Convolution, Dense Connections, FPN, ResNet, RoIAlign
lr sched 3x
Memory (M) 4300.0
Backbone Layers 50
SHOW MORE
SHOW LESS
Mask R-CNN (X-101-32x4d-FPN, 1x, pytorch)

Memory (M) 7600.0
inference time (s/im) 0.0885
File Size 241.03 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, ResNeXt, Convolution, Dense Connections, FPN, RoIAlign
lr sched 1x
Memory (M) 7600.0
Backbone Layers 101
inference time (s/im) 0.0885
SHOW MORE
SHOW LESS
Mask R-CNN (X-101-32x4d-FPN, 2x, pytorch)

lr sched 2x
Backbone Layers 101
File Size 241.03 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, ResNeXt, Convolution, Dense Connections, FPN, RoIAlign
lr sched 2x
Backbone Layers 101
SHOW MORE
SHOW LESS
Mask R-CNN (X-101-32x8d-FPN, 1x, pytorch)

lr sched 1x
Backbone Layers 101
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, ResNeXt, Convolution, Dense Connections, FPN, RoIAlign
lr sched 1x
Backbone Layers 101
SHOW MORE
SHOW LESS
Mask R-CNN (X-101-32x8d-FPN, 1x, pytorch, multiscale)

lr sched 1x
Backbone Layers 101
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, ResNeXt, Convolution, Dense Connections, FPN, RoIAlign
lr sched 1x
Backbone Layers 101
SHOW MORE
SHOW LESS
Mask R-CNN (X-101-32x8d-FPN, 3x, pytorch, multiscale)

lr sched 3x
Backbone Layers 101
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, ResNeXt, Convolution, Dense Connections, FPN, RoIAlign
lr sched 3x
Backbone Layers 101
SHOW MORE
SHOW LESS
Mask R-CNN (X-101-64x4d-FPN, 1x, pytorch)

Memory (M) 10700.0
inference time (s/im) 0.125
File Size 391.11 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, ResNeXt, Convolution, Dense Connections, FPN, RoIAlign
lr sched 1x
Memory (M) 10700.0
Backbone Layers 101
inference time (s/im) 0.125
SHOW MORE
SHOW LESS
Mask R-CNN (X-101-64x4d-FPN, 2x, pytorch)

lr sched 2x
Backbone Layers 101
File Size 391.11 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, ResNeXt, Convolution, Dense Connections, FPN, RoIAlign
lr sched 2x
Backbone Layers 101
SHOW MORE
SHOW LESS
README.md

Mask R-CNN

Introduction

[ALGORITHM]

@article{He_2017,
   title={Mask R-CNN},
   journal={2017 IEEE International Conference on Computer Vision (ICCV)},
   publisher={IEEE},
   author={He, Kaiming and Gkioxari, Georgia and Dollar, Piotr and Girshick, Ross},
   year={2017},
   month={Oct}
}

Results and models

Backbone Style Lr schd Mem (GB) Inf time (fps) box AP mask AP Config Download
R-50-FPN caffe 1x 4.3 38.0 34.4 config model | log
R-50-FPN pytorch 1x 4.4 16.1 38.2 34.7 config model | log
R-50-FPN pytorch 2x - - 39.2 35.4 config model | log
R-101-FPN caffe 1x 40.4 36.4 config model | log
R-101-FPN pytorch 1x 6.4 13.5 40.0 36.1 config model | log
R-101-FPN pytorch 2x - - 40.8 36.6 config model | log
X-101-32x4d-FPN pytorch 1x 7.6 11.3 41.9 37.5 config model | log
X-101-32x4d-FPN pytorch 2x - - 42.2 37.8 config model | log
X-101-64x4d-FPN pytorch 1x 10.7 8.0 42.8 38.4 config model | log
X-101-64x4d-FPN pytorch 2x - - 42.7 38.1 config model | log
X-101-32x8d-FPN pytorch 1x - - 42.8 38.3

Pre-trained Models

We also train some models with longer schedules and multi-scale training. The users could finetune them for downstream tasks.

Backbone Style Lr schd Mem (GB) Inf time (fps) box AP mask AP Config Download
R-50-FPN caffe 2x 4.3 40.3 36.5 config model | log
R-50-FPN caffe 3x 4.3 40.8 37.0 config model | log
X-101-32x8d-FPN pytorch 1x - 43.6 39.0
X-101-32x8d-FPN pytorch 3x - 44.0 39.3

Results

Object Detection on COCO minival

Object Detection on COCO minival
MODEL BOX AP
Mask R-CNN (X-101-32x8d-FPN, 3x, pytorch, multiscale) 44.0
Mask R-CNN (X-101-32x8d-FPN, 1x, pytorch, multiscale) 43.6
Mask R-CNN (X-101-64x4d-FPN, 1x, pytorch) 42.8
Mask R-CNN (X-101-32x8d-FPN, 1x, pytorch) 42.8
Mask R-CNN (X-101-64x4d-FPN, 2x, pytorch) 42.7
Mask R-CNN (X-101-32x4d-FPN, 2x, pytorch) 42.2
Mask R-CNN (X-101-32x4d-FPN, 1x, pytorch) 41.9
Mask R-CNN (R-50-FPN, 3x, caffe, multiscale) 40.8
Mask R-CNN (R-101-FPN, 2x, pytorch) 40.8
Mask R-CNN (R-101-FPN, 1x, caffe) 40.4
Mask R-CNN (R-50-FPN, 2x, caffe, multiscale) 40.3
Mask R-CNN (R-101-FPN, 1x, pytorch) 40.0
Mask R-CNN (R-50-FPN, 2x, pytorch) 39.2
Mask R-CNN (R-50-FPN, 1x, pytorch) 38.2
Mask R-CNN (R-50-FPN, 1x, caffe) 38.0
Instance Segmentation on COCO minival
MODEL MASK AP
Mask R-CNN (X-101-32x8d-FPN, 3x, pytorch, multiscale) 39.3
Mask R-CNN (X-101-32x8d-FPN, 1x, pytorch, multiscale) 39.0
Mask R-CNN (X-101-64x4d-FPN, 1x, pytorch) 38.4
Mask R-CNN (X-101-32x8d-FPN, 1x, pytorch) 38.3
Mask R-CNN (X-101-64x4d-FPN, 2x, pytorch) 38.1
Mask R-CNN (X-101-32x4d-FPN, 2x, pytorch) 37.8
Mask R-CNN (X-101-32x4d-FPN, 1x, pytorch) 37.5
Mask R-CNN (R-50-FPN, 3x, caffe, multiscale) 37.0
Mask R-CNN (R-101-FPN, 2x, pytorch) 36.6
Mask R-CNN (R-50-FPN, 2x, caffe, multiscale) 36.5
Mask R-CNN (R-101-FPN, 1x, caffe) 36.4
Mask R-CNN (R-101-FPN, 1x, pytorch) 36.1
Mask R-CNN (R-50-FPN, 2x, pytorch) 35.4
Mask R-CNN (R-50-FPN, 1x, pytorch) 34.7
Mask R-CNN (R-50-FPN, 1x, caffe) 34.4