Max Iter	270000
Momentum	0.9
lr sched	3×
Weight Decay	0.0001
FLOPs Input No	100
Backbone Layers	101
Output Resolution	224×224

ID	164254221.0
Max Iter	90000
Momentum	0.9
lr sched	1×
Weight Decay	0.0001
Backbone Layers	50
Output Resolution	224×224

ID	164255101
LR	0.01
Max Iter	24000
Momentum	0.9
lr sched	1×
Weight Decay	0.0001
FLOPs Input No	100
Backbone Layers	50
Output Resolution	224×224

ID	164955410.0
Max Iter	270000
Momentum	0.9
lr sched	3×
Weight Decay	0.0001
Backbone Layers	50
Output Resolution	224×224

Max Iter	270000
Momentum	0.9
lr sched	3×
Weight Decay	0.0001
Backbone Layers	101
Output Resolution	224×224

ID	202576688
LR	0.01
Max Iter	65000
Momentum	0.9
Weight Decay	0.0001
Backbone Layers	101
Output Resolution	1024×2048

PointRend

facebookresearch / detectron2

Last updated on Feb 19, 2021

Parameters 79 Million

FLOPs 300 Billion

File Size 302.63 MB

Training Data MS COCO

Training Resources 8 NVIDIA V100 GPUs

Training Time

Training Techniques	SGD with Momentum, Random Horizontal Flip, Weight Decay
Architecture	PointRend, Mask R-CNN, FPN, ResNet
Max Iter	270000
Momentum	0.9
lr sched	3×
Weight Decay	0.0001
FLOPs Input No	100
Backbone Layers	101
Output Resolution	224×224
SHOW MORE
SHOW LESS

Parameters 60 Million

Backbone Layers 50

File Size 229.95 MB

Training Data MS COCO

Training Resources 8 NVIDIA V100 GPUs

Training Time

Training Techniques	SGD with Momentum, Random Horizontal Flip, Weight Decay
Architecture	PointRend, Mask R-CNN, FPN, ResNet
ID	164254221.0
Max Iter	90000
Momentum	0.9
lr sched	1×
Weight Decay	0.0001
Backbone Layers	50
Output Resolution	224×224
SHOW MORE
SHOW LESS

Parameters 56 Million

FLOPs 464 Billion

File Size 214.44 MB

Training Data Cityscapes

Training Resources 8 NVIDIA V100 GPUs

Training Time

Training Techniques	SGD with Momentum, Random Horizontal Flip, Weight Decay
Architecture	PointRend, Mask R-CNN, FPN, ResNet
ID	164255101
LR	0.01
Max Iter	24000
Momentum	0.9
lr sched	1×
Weight Decay	0.0001
FLOPs Input No	100
Backbone Layers	50
Output Resolution	224×224
SHOW MORE
SHOW LESS

Parameters 60 Million

Backbone Layers 50

File Size 229.95 MB

Training Data MS COCO

Training Resources 8 NVIDIA V100 GPUs

Training Time

Training Techniques	SGD with Momentum, Random Horizontal Flip, Weight Decay
Architecture	PointRend, Mask R-CNN, FPN, ResNet
ID	164955410.0
Max Iter	270000
Momentum	0.9
lr sched	3×
Weight Decay	0.0001
Backbone Layers	50
Output Resolution	224×224
SHOW MORE
SHOW LESS

Parameters 123 Million

Backbone Layers 101

File Size 471.77 MB

Training Data MS COCO

Training Resources 8 NVIDIA V100 GPUs

Training Time

Training Techniques	SGD with Momentum, Random Horizontal Flip, Weight Decay
Architecture	PointRend, Mask R-CNN, FPN, ResNeXt
Max Iter	270000
Momentum	0.9
lr sched	3×
Weight Decay	0.0001
Backbone Layers	101
Output Resolution	224×224
SHOW MORE
SHOW LESS

Parameters 48 Million

Backbone Layers 101

File Size 182.36 MB

Training Data Cityscapes

Training Resources 8 NVIDIA V100 GPUs

Training Time

Training Techniques	SGD with Momentum, Random Horizontal Flip, Weight Decay
Architecture	PointRend, Mask R-CNN, FPN, SemanticFPN, ResNet
ID	202576688
LR	0.01
Max Iter	65000
Momentum	0.9
Weight Decay	0.0001
Backbone Layers	101
Output Resolution	1024×2048
SHOW MORE
SHOW LESS

README.md

Summary

PointRend is a module for image segmentation tasks, such as instance and semantic segmentation, that attempts to treat segmentation as image rending problem to efficiently "render" high-quality label maps. It uses a subdivision strategy to adaptively select a non-uniform set of points at which to compute labels. PointRend can be incorporated into popular meta-architectures for both instance segmentation (e.g. Mask R-CNN) and semantic segmentation (e.g. FCN). Its subdivision strategy efficiently computes high-resolution segmentation maps using an order of magnitude fewer floating-point operations than direct, dense computation. Most importantly, Faster R-CNN was not designed for pixel-to-pixel alignment between network inputs and outputs. This is evident in how RoIPool, the de facto core operation for attending to instances, performs coarse spatial quantization for feature extraction. To fix the misalignment, Mask R-CNN utilises a simple, quantization-free layer, called RoIAlign, that faithfully preserves exact spatial locations.

Quick start and visualization

This Colab Notebook tutorial contains examples of PointRend usage and visualizations of its point sampling stages.

Training

To train a model with 8 GPUs run:

cd /path/to/detectron2/projects/PointRend
python train_net.py --config-file configs/InstanceSegmentation/pointrend_rcnn_R_50_FPN_1x_coco.yaml --num-gpus 8

Evaluation

Model evaluation can be done similarly:

cd /path/to/detectron2/projects/PointRend
python train_net.py --config-file configs/InstanceSegmentation/pointrend_rcnn_R_50_FPN_1x_coco.yaml --eval-only MODEL.WEIGHTS /path/to/model_checkpoint

Citation

@InProceedings{kirillov2019pointrend,
  title={{PointRend}: Image Segmentation as Rendering},
  author={Alexander Kirillov and Yuxin Wu and Kaiming He and Ross Girshick},
  journal={ArXiv:1912.08193},
  year={2019}
}

Results

Instance Segmentation on COCO minival

MODEL	MASK AP
PointRend (X101-FPN, 3×)	41.1
PointRend (R101-FPN, 3×)	40.1
PointRend (R50-FPN, 3×)	38.3
PointRend (R50-FPN, 1×)	36.2

Semantic Segmentation on Cityscapes val

MODEL	MIOU
SemanticFPN + PointRend (R101-FPN)	78.9

Instance Segmentation on Cityscapes val

MODEL	MASK AP
PointRend (R50-FPN, 1×, Cityscapes)	35.9