Feature Extractors

Thinned U-shape Module, or TUM, is a feature extraction block used for object detection models. It was introduced as part of the M2Det architecture. Different from FPN and RetinaNet, TUM adopts a thinner U-shape structure as illustrated in the Figure to the right. The encoder is a series of 3x3 convolution layers with stride 2. And the decoder takes the outputs of these layers as its reference set of feature maps, while the original FPN chooses the output of the last layer of each stage in ResNet backbone.

In addition, with TUM, we add 1x1 convolution layers after the upsample and element-wise sum operation at the decoder branch to enhance learning ability and keep smoothness for the features. In the context of M2Det, all of the outputs in the decoder of each TUM form the multi-scale features of the current level. As a whole, the outputs of stacked TUMs form the multi-level multi-scale features, while the front TUM mainly provides shallow-level features, the middle TUM provides medium-level features, and the back TUM provides deep-level features.

Source: M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network


Paper Code Results Date Stars