You Only Look Once: Unified, Real-Time Object Detection

CVPR 2016 Joseph Redmon • Santosh Divvala • Ross Girshick • Ali Farhadi

A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors.

PDF Abstract

Evaluation


Task Dataset Model Metric name Metric value Global rank Compare
Real-Time Object Detection COCO YOLO MAP 63.4% # 1
Real-Time Object Detection COCO YOLO FPS 45 # 1
Object Detection PASCAL VOC 2007 YOLO MAP 63.4% # 16
Real-Time Object Detection PASCAL VOC 2007 YOLO MAP 63.4% # 5
Real-Time Object Detection PASCAL VOC 2007 YOLO FPS 46 # 1