GCNet

Last updated on Feb 23, 2021

Cascade Mask R-CNN (X-101-FPN, 1x)

Memory (M) 9200.0
inference time (s/im) 0.11905
File Size 366.65 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, ResNeXt, Convolution, Dense Connections, FPN, Global Context Block, RoIAlign
lr sched 1x
Memory (M) 9200.0
Backbone Layers 101
inference time (s/im) 0.11905
SHOW MORE
SHOW LESS
Cascade Mask R-CNN (X-101-FPN, 1x, GC(c3-c5, r16))

Memory (M) 10300.0
inference time (s/im) 0.12987
File Size 384.95 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, ResNeXt, Convolution, Dense Connections, FPN, Global Context Block, RoIAlign
lr sched 1x
Memory (M) 10300.0
Backbone Layers 101
inference time (s/im) 0.12987
SHOW MORE
SHOW LESS
Cascade Mask R-CNN (X-101-FPN, 1x, GC(c3-c5, r4))

Memory (M) 10600.0
Backbone Layers 101
File Size 439.03 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, ResNeXt, Convolution, Dense Connections, FPN, Global Context Block, RoIAlign
lr sched 1x
Memory (M) 10600.0
Backbone Layers 101
SHOW MORE
SHOW LESS
DCN Cascade Mask R-CNN (X-101-FPN, 1x)

lr sched 1x
Backbone Layers 101
File Size 297.47 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, ResNeXt, Convolution, Dense Connections, Deformable Convolution, FPN, Global Context Block, RoIAlign
lr sched 1x
Backbone Layers 101
SHOW MORE
SHOW LESS
DCN Cascade Mask R-CNN (X-101-FPN, 1x, GC(c3-c5, r16))

lr sched 1x
Backbone Layers 101
File Size 276.85 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, ResNeXt, Convolution, Dense Connections, Deformable Convolution, FPN, Global Context Block, RoIAlign
lr sched 1x
Backbone Layers 101
SHOW MORE
SHOW LESS
DCN Cascade Mask R-CNN (X-101-FPN, 1x, GC(c3-c5, r4))

lr sched 1x
Backbone Layers 101
File Size 335.64 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, ResNeXt, Convolution, Dense Connections, Deformable Convolution, FPN, Global Context Block, RoIAlign
lr sched 1x
Backbone Layers 101
SHOW MORE
SHOW LESS
Mask R-CNN (R-101-FPN, 1x)

Memory (M) 6400.0
inference time (s/im) 0.07519
File Size 242.32 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, Convolution, Dense Connections, FPN, Global Context Block, ResNet, RoIAlign
lr sched 1x
Memory (M) 6400.0
Backbone Layers 101
inference time (s/im) 0.07519
SHOW MORE
SHOW LESS
Mask R-CNN (R-101-FPN, 1x, GC(c3-c5, r16))

Memory (M) 7600.0
inference time (s/im) 0.08333
File Size 260.63 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, Convolution, Dense Connections, FPN, Global Context Block, ResNet, RoIAlign
lr sched 1x
Memory (M) 7600.0
Backbone Layers 101
inference time (s/im) 0.08333
SHOW MORE
SHOW LESS
Mask R-CNN (R-101-FPN, 1x, GC(c3-c5, r4))

Memory (M) 7800.0
inference time (s/im) 0.08475
File Size 314.70 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, Convolution, Dense Connections, FPN, Global Context Block, ResNet, RoIAlign
lr sched 1x
Memory (M) 7800.0
Backbone Layers 101
inference time (s/im) 0.08475
SHOW MORE
SHOW LESS
Mask R-CNN (R-50-FPN, 1x)

Memory (M) 4400.0
inference time (s/im) 0.06024
File Size 169.62 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, Convolution, Dense Connections, FPN, Global Context Block, ResNet, RoIAlign
lr sched 1x
Memory (M) 4400.0
Backbone Layers 50
inference time (s/im) 0.06024
SHOW MORE
SHOW LESS
Mask R-CNN (R-50-FPN, 1x, GC(c3-c5, r16))

Memory (M) 5000.0
inference time (s/im) 0.06452
File Size 179.26 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, Convolution, Dense Connections, FPN, Global Context Block, ResNet, RoIAlign
lr sched 1x
Memory (M) 5000.0
Backbone Layers 50
inference time (s/im) 0.06452
SHOW MORE
SHOW LESS
Mask R-CNN (R-50-FPN, 1x, GC(c3-c5, r4))

Memory (M) 5100.0
inference time (s/im) 0.06623
File Size 207.79 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, Convolution, Dense Connections, FPN, Global Context Block, ResNet, RoIAlign
lr sched 1x
Memory (M) 5100.0
Backbone Layers 50
inference time (s/im) 0.06623
SHOW MORE
SHOW LESS
Mask R-CNN (X-101-FPN, 1x)

Memory (M) 7600.0
inference time (s/im) 0.0885
File Size 241.03 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, ResNeXt, Convolution, Dense Connections, FPN, Global Context Block, RoIAlign
lr sched 1x
Memory (M) 7600.0
Backbone Layers 101
inference time (s/im) 0.0885
SHOW MORE
SHOW LESS
Mask R-CNN (X-101-FPN, 1x, GC(c3-c5, r16))

Memory (M) 8800.0
inference time (s/im) 0.10204
File Size 259.33 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, ResNeXt, Convolution, Dense Connections, FPN, Global Context Block, RoIAlign
lr sched 1x
Memory (M) 8800.0
Backbone Layers 101
inference time (s/im) 0.10204
SHOW MORE
SHOW LESS
Mask R-CNN (X-101-FPN, 1x, GC(c3-c5, r4))

Memory (M) 9000.0
inference time (s/im) 0.10309
File Size 313.40 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture Softmax, RPN, ResNeXt, Convolution, Dense Connections, FPN, Global Context Block, RoIAlign
lr sched 1x
Memory (M) 9000.0
Backbone Layers 101
inference time (s/im) 0.10309
SHOW MORE
SHOW LESS
README.md

GCNet for Object Detection

By Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, Han Hu.

We provide config files to reproduce the results in the paper for "GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond" on COCO object detection.

Introduction

[ALGORITHM]

GCNet is initially described in arxiv. Via absorbing advantages of Non-Local Networks (NLNet) and Squeeze-Excitation Networks (SENet), GCNet provides a simple, fast and effective approach for global context modeling, which generally outperforms both NLNet and SENet on major benchmarks for various recognition tasks.

Citing GCNet

@article{cao2019GCNet,
  title={GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond},
  author={Cao, Yue and Xu, Jiarui and Lin, Stephen and Wei, Fangyun and Hu, Han},
  journal={arXiv preprint arXiv:1904.11492},
  year={2019}
}

Results and models

The results on COCO 2017val are shown in the below table.

Backbone Model Context Lr schd Mem (GB) Inf time (fps) box AP mask AP Config Download
R-50-FPN Mask GC(c3-c5, r16) 1x 5.0 39.7 35.9 config model | log
R-50-FPN Mask GC(c3-c5, r4) 1x 5.1 15.0 39.9 36.0 config model | log
R-101-FPN Mask GC(c3-c5, r16) 1x 7.6 11.4 41.3 37.2 config model | log
R-101-FPN Mask GC(c3-c5, r4) 1x 7.8 11.6 42.2 37.8 config model | log
Backbone Model Context Lr schd Mem (GB) Inf time (fps) box AP mask AP Config Download
R-50-FPN Mask - 1x 4.4 16.6 38.4 34.6 config model | log
R-50-FPN Mask GC(c3-c5, r16) 1x 5.0 15.5 40.4 36.2 config model | log
R-50-FPN Mask GC(c3-c5, r4) 1x 5.1 15.1 40.7 36.5 config model | log
R-101-FPN Mask - 1x 6.4 13.3 40.5 36.3 config model | log
R-101-FPN Mask GC(c3-c5, r16) 1x 7.6 12.0 42.2 37.8 config model | log
R-101-FPN Mask GC(c3-c5, r4) 1x 7.8 11.8 42.2 37.8 config model | log
X-101-FPN Mask - 1x 7.6 11.3 42.4 37.7 config model | log
X-101-FPN Mask GC(c3-c5, r16) 1x 8.8 9.8 43.5 38.6 config model | log
X-101-FPN Mask GC(c3-c5, r4) 1x 9.0 9.7 43.9 39.0 config model | log
X-101-FPN Cascade Mask - 1x 9.2 8.4 44.7 38.6 config model | log
X-101-FPN Cascade Mask GC(c3-c5, r16) 1x 10.3 7.7 46.2 39.7 config model | log
X-101-FPN Cascade Mask GC(c3-c5, r4) 1x 10.6 46.4 40.1 config model | log
X-101-FPN DCN Cascade Mask - 1x 44.9 38.9 config model | log
X-101-FPN DCN Cascade Mask GC(c3-c5, r16) 1x 44.6 config model | log
X-101-FPN DCN Cascade Mask GC(c3-c5, r4) 1x 45.7 39.5 config model | log

Notes:

  • The SyncBN is added in the backbone for all models in Table 2.
  • GC denotes Global Context (GC) block is inserted after 1x1 conv of backbone.
  • DCN denotes replace 3x3 conv with 3x3 Deformable Convolution in c3-c5 stages of backbone.
  • r4 and r16 denote ratio 4 and ratio 16 in GC block respectively.

Results

Object Detection on COCO minival

Object Detection on COCO minival
MODEL BOX AP
Cascade Mask R-CNN (X-101-FPN, 1x, GC(c3-c5, r4)) 46.4
Cascade Mask R-CNN (X-101-FPN, 1x, GC(c3-c5, r16)) 46.2
DCN Cascade Mask R-CNN (X-101-FPN, 1x, GC(c3-c5, r4)) 45.7
DCN Cascade Mask R-CNN (X-101-FPN, 1x) 44.9
Cascade Mask R-CNN (X-101-FPN, 1x) 44.7
DCN Cascade Mask R-CNN (X-101-FPN, 1x, GC(c3-c5, r16)) 44.6
Mask R-CNN (X-101-FPN, 1x, GC(c3-c5, r4)) 43.9
Mask R-CNN (X-101-FPN, 1x, GC(c3-c5, r16)) 43.5
Mask R-CNN (X-101-FPN, 1x) 42.4
Mask R-CNN (R-101-FPN, 1x, GC(c3-c5, r16)) 42.2
Mask R-CNN (R-101-FPN, 1x, GC(c3-c5, r4)) 42.2
Mask R-CNN (R-50-FPN, 1x, GC(c3-c5, r4)) 40.7
Mask R-CNN (R-101-FPN, 1x) 40.5
Mask R-CNN (R-50-FPN, 1x, GC(c3-c5, r16)) 40.4
Mask R-CNN (R-50-FPN, 1x) 38.4
Instance Segmentation on COCO minival
MODEL MASK AP
Cascade Mask R-CNN (X-101-FPN, 1x, GC(c3-c5, r4)) 40.1
Cascade Mask R-CNN (X-101-FPN, 1x, GC(c3-c5, r16)) 39.7
DCN Cascade Mask R-CNN (X-101-FPN, 1x, GC(c3-c5, r4)) 39.5
Mask R-CNN (X-101-FPN, 1x, GC(c3-c5, r4)) 39.0
DCN Cascade Mask R-CNN (X-101-FPN, 1x) 38.9
Mask R-CNN (X-101-FPN, 1x, GC(c3-c5, r16)) 38.6
Cascade Mask R-CNN (X-101-FPN, 1x) 38.6
Mask R-CNN (R-101-FPN, 1x, GC(c3-c5, r16)) 37.8
Mask R-CNN (R-101-FPN, 1x, GC(c3-c5, r4)) 37.8
Mask R-CNN (X-101-FPN, 1x) 37.7
Mask R-CNN (R-50-FPN, 1x, GC(c3-c5, r4)) 36.5
Mask R-CNN (R-101-FPN, 1x) 36.3
Mask R-CNN (R-50-FPN, 1x, GC(c3-c5, r16)) 36.2
Mask R-CNN (R-50-FPN, 1x) 34.6