The deployment of 3D detectors strikes one of the major challenges in real-world self-driving scenarios.
PTT module in the voting stage could model the interactions among point patches, which learns context-dependent features.
By using cross-attention, the transformer decoder fuses features and includes more target cues into the current point cloud feature to compute the region attentions, which makes the similarity computing more efficient.
Our code is open-sourced for the robotics community at https://github. com/shanjiayao/PTT.
Then, to fuse the information of features in the two branches and obtain their similarity, we propose two cross correlation modules, named Pointcloud-wise and Point-wise respectively.