Scaled-YOLOv4: Scaling Cross Stage Partial Network

We show that the YOLOv4 object detection neural network based on the CSP approach, scales both up and down and is applicable to small and large networks while maintaining optimal speed and accuracy. We propose a network scaling approach that modifies not only the depth, width, resolution, but also structure of the network. YOLOv4-large model achieves state-of-the-art results: 55.5% AP (73.4% AP50) for the MS COCO dataset at a speed of ~16 FPS on Tesla V100, while with the test time augmentation, YOLOv4-large achieves 56.0% AP (73.3 AP50). To the best of our knowledge, this is currently the highest accuracy on the COCO dataset among any published work. The YOLOv4-tiny model achieves 22.0% AP (42.0% AP50) at a speed of 443 FPS on RTX 2080Ti, while by using TensorRT, batch size = 4 and FP16-precision the YOLOv4-tiny achieves 1774 FPS.

PDF Abstract CVPR 2021 PDF CVPR 2021 Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Object Detection COCO minival YOLOv4-P7 CSP-P7 (single-scale, 16 fps) box AP 55.4 # 47
AP50 73.3 # 12
AP75 60.7 # 6
APS 38.1 # 8
APM 59.5 # 7
APL 67.4 # 14
Object Detection COCO test-dev YOLOv4-P5 with TTA box mAP 52.5 # 67
AP50 70.3 # 34
AP75 58 # 27
Object Detection COCO test-dev YOLOv4-P7 with TTA box mAP 55.8 # 44
AP50 73.2 # 16
AP75 61.2 # 12
Object Detection COCO test-dev YOLOv4-P6 CSP-P6 (single-scale, 32 fps) box mAP 54.3 # 52
AP50 72.3 # 18
AP75 59.5 # 19
APS 36.6 # 13
APM 58.2 # 14