A Feature Pyramid Network, or FPN, is a feature extractor that takes a single-scale image of an arbitrary size as input, and outputs proportionally sized feature maps at multiple levels, in a fully convolutional fashion. This process is independent of the backbone convolutional architectures. It therefore acts as a generic solution for building feature pyramids inside deep convolutional networks to be used in tasks like object detection.
The construction of the pyramid involves a bottom-up pathway and a top-down pathway.
The bottom-up pathway is the feedforward computation of the backbone ConvNet, which computes a feature hierarchy consisting of feature maps at several scales with a scaling step of 2. For the feature pyramid, one pyramid level is defined for each stage. The output of the last layer of each stage is used as a reference set of feature maps. For ResNets we use the feature activations output by each stage’s last residual block.
The top-down pathway hallucinates higher resolution features by upsampling spatially coarser, but semantically stronger, feature maps from higher pyramid levels. These features are then enhanced with features from the bottom-up pathway via lateral connections. Each lateral connection merges feature maps of the same spatial size from the bottom-up pathway and the top-down pathway. The bottom-up feature map is of lower-level semantics, but its activations are more accurately localized as it was subsampled fewer times.
Source: Feature Pyramid Networks for Object DetectionPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Object Detection | 293 | 31.07% |
Semantic Segmentation | 79 | 8.38% |
Instance Segmentation | 58 | 6.15% |
Image Classification | 41 | 4.35% |
General Classification | 23 | 2.44% |
Autonomous Driving | 18 | 1.91% |
Classification | 17 | 1.80% |
Real-Time Object Detection | 17 | 1.80% |
Pedestrian Detection | 15 | 1.59% |