Unlike the single-scan-based semantic segmentation task, this task requires distinguishing the motion states of points in addition to their semantic categories.
Category-level 6D pose estimation aims to predict the poses and sizes of unseen objects from a specific category.
In this work, we study the varying-sparsity distribution of LiDAR points and present SphereFormer to directly aggregate information from dense close points to the sparse distant ones.
Ranked #1 on 3D Semantic Segmentation on SemanticKITTI (using extra training data)
Our core insight is to predict objects directly based on sparse voxel features, without relying on hand-crafted proxies.
Ranked #1 on 3D Object Detection on Argoverse2
3D scenes are dominated by a large number of background points, which is redundant for the detection task that mainly needs to focus on foreground objects.
In this paper, we propose Stratified Transformer that is able to capture long-range contexts and demonstrates strong generalization ability and high performance.
Ranked #8 on Semantic Segmentation on ScanNet