Enhancing Multi-View Pedestrian Detection Through Generalized 3D Feature Pulling

The main challenge in multi-view pedestrian detection is integrating view-specific features into a unified space for comprehensive end-to-end perception. Prior multi-view detection methods have focused on projecting perspective-view features onto the ground plane, creating a "bird's eye view" (BEV) representation of the scene. This paper proposes a simple but effective architecture that utilizes a non-parametric 3D feature-pulling strategy. This strategy directly extracts the corresponding 2D features for each valid voxel within the 3D feature volume, addressing the feature loss that may arise in previous methods. The proposed framework introduces three novel modules, each crafted to bolster the generalization capabilities of multi-view detection systems. Through extensive experiments, the efficacy of the proposed model is demonstrated. The results show a new state-of-the-art accuracy, both in conventional scenarios and particularly in the context of scene generalization benchmarks.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Multiview Detection GMVD MVFP MODA 73.3 # 1
Recall 79.2 # 1
Multiview Detection MultiviewX MVFP MODA 95.7 # 2
MODP 85.1 # 3
Recall 97.2 # 1
Multiview Detection Wildtrack MVFP MODA 94.1 # 1
MODP 78.8 # 3
Recall 97.7 # 1

Methods


No methods listed for this paper. Add relevant methods here