Voxel RoI Pooling

Introduced by Deng et al. in Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection

Voxel RoI Pooling is a RoI feature extractor extracts RoI features directly from voxel features for further refinement. It starts by dividing a region proposal into $G \times G \times G$ regular sub-voxels. The center point is taken as the grid point of the corresponding sub-voxel. Since $3 D$ feature volumes are extremely sparse (non-empty voxels account for $<3 \%$ spaces), we cannot directly utilize max pooling over features of each sub-voxel. Instead, features are integrated from neighboring voxels into the grid points for feature extraction. Specifically, given a grid point $g_{i}$, we first exploit voxel query to group a set of neighboring voxels $\Gamma_{i}=\left(\mathbf{v}_{i}^{1}, \mathbf{v}_{i}^{2}, \cdots, \mathbf{v}_{i}^{K}\right) .$ Then, we aggregate the neighboring voxel features with a PointNet module $\mathrm{a}$ as:

$$ \mathbf{\eta}_{i}=\max _{k=1,2, \cdots, K}\left(\Psi\left(\left[\mathbf{v}_{i}^{k}-\mathbf{g}_{i} ; \mathbf{\phi}_{i}^{k}\right]\right)\right) $$

where $\mathbf{v}_{i}-\mathbf{g}_{i}$ represents the relative coordinates, $\mathbf{\phi}_{i}^{k}$ is the voxel feature of $\mathbf{v}_{i}^{k}$, and $\Psi(\cdot)$ indicates an MLP. The max pooling operation $\max (\cdot)$ is performed along the channels to obtain the aggregated feature vector $\eta_{i} .$ Particularly, Voxel RoI pooling is exploited to extract voxel features from the 3D feature volumes out of the last two stages in the $3 \mathrm{D}$ backbone network. And for each stage, two Manhattan distance thresholds are set to group voxels with multiple scales. Then, we concatenate the aggregated features pooled from different stages and scales to obtain the RoI features.

Source: Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
3D Object Detection	5	41.67%
Object Detection	5	41.67%
Feature Importance	1	8.33%
Scene Understanding	1	8.33%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Max Pooling	Pooling Operations
PointNet	3D Representations

Categories

Add Remove

RoI Feature Extractors