Where in the World is this Image? Transformer-based Geo-localization in the Wild

29 Apr 2022  ·  Shraman Pramanick, Ewa M. Nowara, Joshua Gleason, Carlos D. Castillo, Rama Chellappa ·

Predicting the geographic location (geo-localization) from a single ground-level RGB image taken anywhere in the world is a very challenging problem. The challenges include huge diversity of images due to different environmental scenarios, drastic variation in the appearance of the same location depending on the time of the day, weather, season, and more importantly, the prediction is made from a single image possibly having only a few geo-locating cues. For these reasons, most existing works are restricted to specific cities, imagery, or worldwide landmarks. In this work, we focus on developing an efficient solution to planet-scale single-image geo-localization. To this end, we propose TransLocator, a unified dual-branch transformer network that attends to tiny details over the entire image and produces robust feature representation under extreme appearance variations. TransLocator takes an RGB image and its semantic segmentation map as inputs, interacts between its two parallel branches after each transformer layer, and simultaneously performs geo-localization and scene recognition in a multi-task fashion. We evaluate TransLocator on four benchmark datasets - Im2GPS, Im2GPS3k, YFCC4k, YFCC26k and obtain 5.5%, 14.1%, 4.9%, 9.9% continent-level accuracy improvement over the state-of-the-art. TransLocator is also validated on real-world test images and found to be more effective than previous methods.

PDF Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Photo geolocation estimation GWS15k Translocator Street level (1 km) 0.5 # 4
City level (25 km) 1.1 # 4
Region level (200 km) 8.0 # 4
Country level (750 km) 25.5 # 4
Continent level (2500 km) 48.3 # 4
Photo geolocation estimation Im2GPS3k Translocator Street level (1 km) 11.8 # 3
City level (25 km) 31.1 # 4
Region level (200 km) 46.7 # 3
Country level (750 km) 58.9 # 5
Continent level (2500 km) 80.1 # 4
Training Images 4.7M # 5
Photo geolocation estimation YFCC26k Translocator Street level (1 km) 7.2 # 4
City level (25 km) 17.8 # 4
Region level (200 km) 28.0 # 4
Country level (750 km) 41.3 # 4
Continent level (2500 km) 60.6 # 4
Training Images 4.7M # 2

Methods


No methods listed for this paper. Add relevant methods here