In this paper, we propose Semantic-Aware BEV Pooling (SA-BEVPool), which can filter out background information according to the semantic segmentation of image features and transform image features into semantic-aware BEV features.
Ranked #6 on 3D Object Detection on nuScenes Camera Only
In search engines, query expansion (QE) is a crucial technique to improve search experience.
A key challenge for LiDAR-based 3D object detection is to capture sufficient features from large scale 3D scenes especially for distant or/and occluded objects.
To the best of our knowledge, it is currently the largest manually-annotated Chinese dataset for open event extraction.
As a successful approach to self-supervised learning, contrastive learning aims to learn invariant information shared among distortions of the input sample.
To understand the reasons behind this phenomenon, we revisit the learning paradigm of knowledge distillation on the few-shot object detection task from the causal theoretic standpoint, and accordingly, develop a Structural Causal Model.
In autonomous driving, LiDAR point-clouds and RGB images are two major data modalities with complementary cues for 3D object detection.
Based on the traditional attention mechanism, multi-scale fusion self attention extracts phrase information at different scales by setting convolution kernels at different levels, and calculates the corresponding attention matrix at different scales, so that the model can better extract phrase level information.
The conventional encoder-decoder framework for image captioning generally adopts a single-pass decoding process, which predicts the target descriptive sentence word by word in temporal order.
LiDAR-based 3D object detection is an important task for autonomous driving and current approaches suffer from sparse and partial point clouds of distant and occluded objects.
Ranked #4 on 3D Object Detection on KITTI Cars Hard val