ID	fasterrcnn_mobilenet_v3_large_320_fpn
LR	0.02
Epochs	26
LR Steps	16, 22
Momentum	0.9
Batch Size	2
LR Step Size	8
Weight Decay	0.0001
inference time (s/im)	0.1679
Aspect Ratio Group Factor	3

ID	fasterrcnn_mobilenet_v3_large_fpn
LR	0.02
Epochs	26
LR Steps	16, 22
Momentum	0.9
Batch Size	2
LR Step Size	8
Weight Decay	0.0001
inference time (s/im)	0.8409
Aspect Ratio Group Factor	3

ID	fasterrcnn_resnet50_fpn
LR	0.02
Epochs	26
LR Steps	16, 22
Momentum	0.9
Batch Size	2
Memory (GB)	5.2
LR Step Size	8
Weight Decay	0.0001
train time (s/im)	0.2288
inference time (s/im)	0.059
Aspect Ratio Group Factor	3

Faster R-CNN

pytorch / vision

Last updated on Feb 12, 2021

Parameters 19 Million

FLOPs 7 Billion

File Size 74.24 MB

Training Data MS COCO

Training Resources 8x NVIDIA V100 GPUs

Training Time

Training Techniques	Weight Decay, SGD with Momentum
Architecture	RPN, RoIPool, FPN, Feedforward Network, 1x1 Convolution, Batch Normalization, Convolution, Dense Connections, Depthwise Separable Convolution, Dropout, Global Average Pooling, Hard Swish, Inverted Residual Block, Residual Connection, ReLU, Softmax, Squeeze-and-Excitation Block
ID	fasterrcnn_mobilenet_v3_large_320_fpn
LR	0.02
Epochs	26
LR Steps	16, 22
Momentum	0.9
Batch Size	2
LR Step Size	8
Weight Decay	0.0001
inference time (s/im)	0.1679
Aspect Ratio Group Factor	3
SHOW MORE
SHOW LESS

Parameters 19 Million

FLOPs 44 Billion

File Size 74.24 MB

Training Data MS COCO

Training Resources 8x NVIDIA V100 GPUs

Training Time

Training Techniques	Weight Decay, SGD with Momentum
Architecture	RPN, RoIPool, FPN, Feedforward Network, 1x1 Convolution, Batch Normalization, Convolution, Dense Connections, Depthwise Separable Convolution, Dropout, Global Average Pooling, Hard Swish, Inverted Residual Block, Residual Connection, ReLU, Softmax, Squeeze-and-Excitation Block
ID	fasterrcnn_mobilenet_v3_large_fpn
LR	0.02
Epochs	26
LR Steps	16, 22
Momentum	0.9
Batch Size	2
LR Step Size	8
Weight Decay	0.0001
inference time (s/im)	0.8409
Aspect Ratio Group Factor	3
SHOW MORE
SHOW LESS

Parameters 42 Million

FLOPs 447 Billion

File Size 159.74 MB

Training Data MS COCO

Training Resources 8x NVIDIA V100 GPUs

Training Time

Training Techniques	Weight Decay, SGD with Momentum
Architecture	RPN, RoIPool, FPN, Feedforward Network, 1x1 Convolution, Bottleneck Residual Block, Batch Normalization, Convolution, Global Average Pooling, Residual Block, Residual Connection, ReLU, Max Pooling, Softmax, Non Maximum Suppression
ID	fasterrcnn_resnet50_fpn
LR	0.02
Epochs	26
LR Steps	16, 22
Momentum	0.9
Batch Size	2
Memory (GB)	5.2
LR Step Size	8
Weight Decay	0.0001
train time (s/im)	0.2288
inference time (s/im)	0.059
Aspect Ratio Group Factor	3
SHOW MORE
SHOW LESS

README.md

Summary

Faster R-CNN is an object detection model that improves on Fast R-CNN by utilising a region proposal network (RPN) with the CNN model. The RPN shares full-image convolutional features with the detection network, enabling nearly cost-free region proposals. It is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. RPN and Fast R-CNN are merged into a single network by sharing their convolutional features: the RPN component tells the unified network where to look.

As a whole, Faster R-CNN consists of two modules. The first module is a deep fully convolutional network that proposes regions, and the second module is the Fast R-CNN detector that uses the proposed regions.

How do I load this model?

To load a pretrained model:

import torchvision.models as models
fasterrcnn_resnet50_fpn = models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

Replace the model name with the variant you want to use, e.g. fasterrcnn_resnet50_fpn. You can find the IDs in the model summaries at the top of this page.

To evaluate the model, use the object detection recipes from the library.

How do I train this model?

You can follow the torchvision recipe on GitHub for training a new model afresh.

Citation

@article{DBLP:journals/corr/RenHG015,
  author    = {Shaoqing Ren and
               Kaiming He and
               Ross B. Girshick and
               Jian Sun},
  title     = {Faster {R-CNN:} Towards Real-Time Object Detection with Region Proposal
               Networks},
  journal   = {CoRR},
  volume    = {abs/1506.01497},
  year      = {2015},
  url       = {http://arxiv.org/abs/1506.01497},
  archivePrefix = {arXiv},
  eprint    = {1506.01497},
  timestamp = {Mon, 13 Aug 2018 16:46:02 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/RenHG015.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

If you use the MobileNet V3 backbone:

@article{DBLP:journals/corr/abs-1905-02244,
  author    = {Andrew Howard and
               Mark Sandler and
               Grace Chu and
               Liang{-}Chieh Chen and
               Bo Chen and
               Mingxing Tan and
               Weijun Wang and
               Yukun Zhu and
               Ruoming Pang and
               Vijay Vasudevan and
               Quoc V. Le and
               Hartwig Adam},
  title     = {Searching for MobileNetV3},
  journal   = {CoRR},
  volume    = {abs/1905.02244},
  year      = {2019},
  url       = {http://arxiv.org/abs/1905.02244},
  archivePrefix = {arXiv},
  eprint    = {1905.02244},
  timestamp = {Tue, 12 Jan 2021 15:30:06 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-1905-02244.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Results

Object Detection on COCO minival

Object Detection

BENCHMARK	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
COCO minival	Faster R-CNN ResNet-50 FPN	box AP	37.0	# 103
COCO minival	Faster R-CNN MobileNetV3-Large FPN	box AP	32.8	# 115
COCO minival	Faster R-CNN MobileNetV3-Large 320 FPN	box AP	22.8	# 121