SMDT: Cross-View Geo-Localization with Image Alignment and Transformer

The goal of cross-view geo-localization is to determine the location of a given ground image by matching with aerial images. However, existing methods ignore the variability of scenes, additional information and spatial correspondence of covisibility and non-convisibility areas in ground-aerial image pairs. In this context, we propose a cross-view matching method called SMDT with image alignment and Transformer. First, we utilize semantic segmentation technique to segment different areas. Then, we convert the vertical view of aerial images to front view by mixing polar mapping and perspective mapping. Next, we simultaneously train dual conditional generative adversarial nets by taking the semantic segmentation images and converted images as input to synthesize the aerial image with ground view style. These steps are collectively referred to as image alignment. Last, we use Transformer to explicitly utilize the properties of selfattention. Experiments show that our SMDT method is superior to the existing ground-to-aerial cross-view methods.

PDF

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods