Faster R-CNN

Last updated on Feb 12, 2021

Faster R-CNN MobileNetV3-Large 320 FPN

Parameters 19 Million
FLOPs 7 Billion
File Size 74.24 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Training Techniques Weight Decay, SGD with Momentum
Architecture RPN, RoIPool, FPN, Feedforward Network, 1x1 Convolution, Batch Normalization, Convolution, Dense Connections, Depthwise Separable Convolution, Dropout, Global Average Pooling, Hard Swish, Inverted Residual Block, Residual Connection, ReLU, Softmax, Squeeze-and-Excitation Block
ID fasterrcnn_mobilenet_v3_large_320_fpn
LR 0.02
Epochs 26
LR Steps 16, 22
Momentum 0.9
Batch Size 2
LR Step Size 8
Weight Decay 0.0001
inference time (s/im) 0.1679
Aspect Ratio Group Factor 3
SHOW MORE
SHOW LESS
Faster R-CNN MobileNetV3-Large FPN

Parameters 19 Million
FLOPs 44 Billion
File Size 74.24 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Training Techniques Weight Decay, SGD with Momentum
Architecture RPN, RoIPool, FPN, Feedforward Network, 1x1 Convolution, Batch Normalization, Convolution, Dense Connections, Depthwise Separable Convolution, Dropout, Global Average Pooling, Hard Swish, Inverted Residual Block, Residual Connection, ReLU, Softmax, Squeeze-and-Excitation Block
ID fasterrcnn_mobilenet_v3_large_fpn
LR 0.02
Epochs 26
LR Steps 16, 22
Momentum 0.9
Batch Size 2
LR Step Size 8
Weight Decay 0.0001
inference time (s/im) 0.8409
Aspect Ratio Group Factor 3
SHOW MORE
SHOW LESS
Faster R-CNN ResNet-50 FPN

Parameters 42 Million
FLOPs 447 Billion
File Size 159.74 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Training Techniques Weight Decay, SGD with Momentum
Architecture RPN, RoIPool, FPN, Feedforward Network, 1x1 Convolution, Bottleneck Residual Block, Batch Normalization, Convolution, Global Average Pooling, Residual Block, Residual Connection, ReLU, Max Pooling, Softmax, Non Maximum Suppression
ID fasterrcnn_resnet50_fpn
LR 0.02
Epochs 26
LR Steps 16, 22
Momentum 0.9
Batch Size 2
Memory (GB) 5.2
LR Step Size 8
Weight Decay 0.0001
train time (s/im) 0.2288
inference time (s/im) 0.059
Aspect Ratio Group Factor 3
SHOW MORE
SHOW LESS
README.md

Summary

Faster R-CNN is an object detection model that improves on Fast R-CNN by utilising a region proposal network (RPN) with the CNN model. The RPN shares full-image convolutional features with the detection network, enabling nearly cost-free region proposals. It is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. RPN and Fast R-CNN are merged into a single network by sharing their convolutional features: the RPN component tells the unified network where to look.

As a whole, Faster R-CNN consists of two modules. The first module is a deep fully convolutional network that proposes regions, and the second module is the Fast R-CNN detector that uses the proposed regions.

How do I load this model?

To load a pretrained model:

import torchvision.models as models
fasterrcnn_resnet50_fpn = models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

Replace the model name with the variant you want to use, e.g. fasterrcnn_resnet50_fpn. You can find the IDs in the model summaries at the top of this page.

To evaluate the model, use the object detection recipes from the library.

How do I train this model?

You can follow the torchvision recipe on GitHub for training a new model afresh.

Citation

@article{DBLP:journals/corr/RenHG015,
  author    = {Shaoqing Ren and
               Kaiming He and
               Ross B. Girshick and
               Jian Sun},
  title     = {Faster {R-CNN:} Towards Real-Time Object Detection with Region Proposal
               Networks},
  journal   = {CoRR},
  volume    = {abs/1506.01497},
  year      = {2015},
  url       = {http://arxiv.org/abs/1506.01497},
  archivePrefix = {arXiv},
  eprint    = {1506.01497},
  timestamp = {Mon, 13 Aug 2018 16:46:02 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/RenHG015.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

If you use the MobileNet V3 backbone:

@article{DBLP:journals/corr/abs-1905-02244,
  author    = {Andrew Howard and
               Mark Sandler and
               Grace Chu and
               Liang{-}Chieh Chen and
               Bo Chen and
               Mingxing Tan and
               Weijun Wang and
               Yukun Zhu and
               Ruoming Pang and
               Vijay Vasudevan and
               Quoc V. Le and
               Hartwig Adam},
  title     = {Searching for MobileNetV3},
  journal   = {CoRR},
  volume    = {abs/1905.02244},
  year      = {2019},
  url       = {http://arxiv.org/abs/1905.02244},
  archivePrefix = {arXiv},
  eprint    = {1905.02244},
  timestamp = {Tue, 12 Jan 2021 15:30:06 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-1905-02244.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Results

Object Detection on COCO minival

Object Detection
BENCHMARK MODEL METRIC NAME METRIC VALUE GLOBAL RANK
COCO minival Faster R-CNN ResNet-50 FPN box AP 37.0 # 103
COCO minival Faster R-CNN MobileNetV3-Large FPN box AP 32.8 # 115
COCO minival Faster R-CNN MobileNetV3-Large 320 FPN box AP 22.8 # 121