Stacked Homography Transformations for Multi-View Pedestrian Detection

Multi-view pedestrian detection aims to predict a bird's eye view (BEV) occupancy map from multiple camera views. This task is confronted with two challenges: how to establish the 3D correspondences from views to the BEV map and how to assemble occupancy information across views. In this paper, we propose a novel Stacked HOmography Transformations (SHOT) approach, which is motivated by approximating projections in 3D world coordinates via a stack of homographies. We first construct a stack of transformations for projecting views to the ground plane at different height levels. Then we design a soft selection module so that the network learns to predict the likelihood of the stack of transformations. Moreover, we provide an in-depth theoretical analysis on constructing SHOT and how well SHOT approximates projections in 3D world coordinates. SHOT is empirically verified to be capable of estimating accurate correspondences from individual views to the BEV map, leading to new state-of-the-art performance on standard evaluation benchmarks.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Multiview Detection MultiviewX SHOT MODA 88.3 # 7
MODP 82.0 # 5
Recall 91.5 # 4
Multiview Detection Wildtrack SHOT MODA 90.2 # 7
MODP 76.5 # 5
Recall 94.0 # 4

Methods


No methods listed for this paper. Add relevant methods here