Multi-Scale Training

SNIP, or Scale Normalization for Image Pyramids, is a multi-scale training scheme that selectively back-propagates the gradients of object instances of different sizes as a function of the image scale. SNIP is a modified version of MST where only the object instances that have a resolution close to the pre-training dataset, which is typically 224x224, are used for training the detector. In multi-scale training (MST), each image is observed at different resolutions therefore, at a high resolution (like 1400x2000) large objects are hard to classify and at a low resolution (like 480x800) small objects are hard to classify. Fortunately, each object instance appears at several different scales and some of those appearances fall in the desired scale range. In order to eliminate extreme scale objects, either too large or too small, training is only performed on objects that fall in the desired scale range and the remainder are simply ignored during back-propagation. Effectively, SNIP uses all the object instances during training, which helps capture all the variations in appearance and pose, while reducing the domain-shift in the scale-space for the pre-trained network.

Source: An Analysis of Scale Invariance in Object Detection - SNIP

Papers


Paper Code Results Date Stars

Tasks


Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories