RetinaNet is a one-stage object detection model that utilizes a focal loss function to address class imbalance during training. Focal loss applies a modulating term to the cross entropy loss in order to focus learning on hard negative examples. RetinaNet is a single, unified network composed of a backbone network and two task-specific subnetworks. The backbone is responsible for computing a convolutional feature map over an entire input image and is an off-the-self convolutional network. The first subnet performs convolutional object classification on the backbone's output; the second subnet performs convolutional bounding box regression. The two subnetworks feature a simple design that the authors propose specifically for one-stage, dense detection.
We can see the motivation for focal loss by comparing with two-stage object detectors. Here class imbalance is addressed by a two-stage cascade and sampling heuristics. The proposal stage (e.g., Selective Search, EdgeBoxes, DeepMask, RPN) rapidly narrows down the number of candidate object locations to a small number (e.g., 1-2k), filtering out most background samples. In the second classification stage, sampling heuristics, such as a fixed foreground-to-background ratio, or online hard example mining (OHEM), are performed to maintain a manageable balance between foreground and background.
In contrast, a one-stage detector must process a much larger set of candidate object locations regularly sampled across an image. To tackle this, RetinaNet uses a focal loss function, a dynamically scaled cross entropy loss, where the scaling factor decays to zero as confidence in the correct class increases. Intuitively, this scaling factor can automatically down-weight the contribution of easy examples during training and rapidly focus the model on hard examples.
Formally, the Focal Loss adds a factor $(1 - p_{t})^\gamma$ to the standard cross entropy criterion. Setting $\gamma>0$ reduces the relative loss for well-classified examples ($p_{t}>.5$), putting more focus on hard, misclassified examples. Here there is tunable focusing parameter $\gamma \ge 0$.
$$ {\text{FL}(p_{t}) = - (1 - p_{t})^\gamma \log\left(p_{t}\right)} $$
Source: Focal Loss for Dense Object DetectionPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Object Detection | 153 | 42.62% |
Instance Segmentation | 15 | 4.18% |
General Classification | 13 | 3.62% |
Test | 9 | 2.51% |
Pedestrian Detection | 8 | 2.23% |
Autonomous Driving | 7 | 1.95% |
Classification | 7 | 1.95% |
Real-Time Object Detection | 6 | 1.67% |
Management | 6 | 1.67% |
Component | Type |
|
---|---|---|
![]() |
Loss Functions | |
![]() |
Feature Extractors | |
![]() |
Convolutional Neural Networks | (optional) |