DELG

Introduced by Cao et al. in Unifying Deep Local and Global Features for Image Search

DELG is a convolutional neural network for image retrieval that combines generalized mean pooling for global features and attentive selection for local features. The entire network can be learned end-to-end by carefully balancing the gradient flow between two heads – requiring only image-level labels. This allows for efficient inference by extracting an image’s global feature, detected keypoints and local descriptors within a single model.

The model is enabled by leveraging hierarchical image representations that arise in CNNs, which are coupled to generalized mean pooling and attentive local feature detection. Secondly, a convolutional autoencoder module is adopted that can successfully learn low-dimensional local descriptors. This can be readily integrated into the unified model, and avoids the need of post-processing learning steps, such as PCA, that are commonly used. Finally, a procedure is used that enables end-to-end training of the proposed model using only image-level supervision. This requires carefully controlling the gradient flow between the global and local network heads during backpropagation, to avoid disrupting the desired representations.

Source: Unifying Deep Local and Global Features for Image Search

Read Paper See Code