3D Object Detection Models

Voxel Transformer

Introduced by Mao et al. in Voxel Transformer for 3D Object Detection

VoTr is a Transformer-based 3D backbone for 3D object detection from point clouds. It contains a series of sparse and submanifold voxel modules. Submanifold voxel modules perform multi-head self-attention strictly on the non-empty voxels, while sparse voxel modules can extract voxel features at empty locations. Long-range relationships between voxels are captured via self-attention.

Given the fact that non-empty voxels are naturally sparse but numerous, directly applying standard Transformer on voxels is non-trivial. To this end, VoTr uses a sparse voxel module and a submanifold voxel module, which can operate on the empty and non-empty voxel positions effectively. To further enlarge the attention range while maintaining comparable computational overhead to the convolutional counterparts, two attention mechanisms are used for multi-head attention in those two modules: Local Attention and Dilated Attention. Furthermore Fast Voxel Query is used to accelerate the querying process in multi-head attention.

Source: Voxel Transformer for 3D Object Detection

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
3D Object Detection 1 25.00%
Computational Efficiency 1 25.00%
Object Detection 1 25.00%
Object Recognition 1 25.00%

Categories