RetinaNet Explained | Papers With Code

Method Name:*

Method Full Name:*

Description with Markdown (optional):

**RetinaNet** is a one-stage object detection model that utilizes a [focal loss](https://paperswithcode.com/method/focal-loss) function to address class imbalance during training. Focal loss applies a modulating term to the cross entropy loss in order to focus learning on hard negative examples. RetinaNet is a single, unified network composed of a *backbone* network and two task-specific *subnetworks*. The backbone is responsible for computing a convolutional feature map over an entire input image and is an off-the-self convolutional network. The first subnet performs convolutional object classification on the backbone's output; the second subnet performs convolutional bounding box regression. The two subnetworks feature a simple design that the authors propose specifically for one-stage, dense detection.

We can see the motivation for focal loss by comparing with two-stage object detectors. Here class imbalance is addressed by a two-stage cascade and sampling heuristics. The proposal stage (e.g., [Selective Search](https://paperswithcode.com/method/selective-search), [EdgeBoxes](https://paperswithcode.com/method/edgeboxes), [DeepMask](https://paperswithcode.com/method/deepmask), [RPN](https://paperswithcode.com/method/rpn)) rapidly narrows down the number of candidate object locations to a small number (e.g., 1-2k), filtering out most background samples. In the second classification stage, sampling heuristics, such as a fixed foreground-to-background ratio, or online hard example mining ([OHEM](https://paperswithcode.com/method/ohem)), are performed to maintain a
manageable balance between foreground and background.

In contrast, a one-stage detector must process a much larger set of candidate object locations regularly sampled across an image. To tackle this, RetinaNet uses a focal loss function, a dynamically scaled cross entropy loss, where the scaling factor decays to zero as confidence in the correct class increases. Intuitively, this scaling factor can automatically down-weight the contribution of easy examples during training and rapidly focus the model on hard examples.

Formally, the Focal Loss adds a factor $(1 - p\_{t})^\gamma$ to the standard cross entropy criterion. Setting $\gamma>0$ reduces the relative loss for well-classified examples ($p\_{t}>.5$), putting more focus on hard, misclassified examples. Here there is tunable *focusing* parameter $\gamma \ge 0$.

$$ {\text{FL}(p\_{t}) = - (1 - p\_{t})^\gamma \log\left(p\_{t}\right)} $$

Code Snippet URL (optional):

Image

Currently: methods/Screen_Shot_2020-06-07_at_4.22.37_PM.png Clear
Change:

Attached collections:

OBJECT DETECTION MODELS

ONE-STAGE OBJECT DETECTION MODELS

Add:

New collection name:

Top-level area:

Parent collection (if any):

Description (optional):

Task	Papers	Share
Object Detection	157	42.55%
Instance Segmentation	15	4.07%
General Classification	13	3.52%
Pedestrian Detection	10	2.71%
Autonomous Driving	7	1.90%
Classification	7	1.90%
Real-Time Object Detection	6	1.63%
Management	6	1.63%
Model Compression	5	1.36%

Component	Type	Add Remove
Focal Loss	Loss Functions
FPN	Feature Extractors
ResNet	Convolutional Neural Networks	(optional)

RetinaNet

Papers

Tasks

Usage Over Time

Components

Categories

Add Remove