VoTr Explained | Papers With Code

Method Name:*

Method Full Name:*

Description with Markdown (optional):

**VoTr** is a [Transformer](https://paperswithcode.com/method/transformer)-based 3D backbone for 3D object detection from point clouds. It contains a series of sparse and submanifold voxel modules. Submanifold voxel modules perform multi-head self-attention strictly on the non-empty voxels, while sparse voxel modules can extract voxel features at empty locations. Long-range relationships between voxels are captured via self-attention.

Given the fact that non-empty voxels are naturally sparse but numerous, directly applying standard Transformer on voxels is non-trivial. To this end, VoTr uses a sparse voxel module and a submanifold voxel module, which can operate on the empty and non-empty voxel positions effectively. To further enlarge the attention range while maintaining comparable computational overhead to the convolutional counterparts, two attention mechanisms are used for [multi-head attention](https://paperswithcode.com/method/multi-head-attention) in those two modules: Local Attention and Dilated Attention. Furthermore [Fast Voxel Query](https://paperswithcode.com/method/fast-voxel-query) is used to accelerate the querying process in multi-head attention.

Code Snippet URL (optional):

Image

Currently: methods/Screen_Shot_2021-09-15_at_5.38.14_PM.png Clear
Change:

Attached collections:

3D OBJECT DETECTION MODELS

Add:

New collection name:

Top-level area:

Parent collection (if any):

Description (optional):

Task	Papers	Share
3D Object Detection	1	25.00%
Computational Efficiency	1	25.00%
Object Detection	1	25.00%
Object Recognition	1	25.00%

Voxel Transformer

Papers

Tasks

Usage Over Time

Components

Categories

Add Remove