Recently, the emergence of a large number of Synthetic Aperture Radar (SAR) sensors and target datasets has made it possible to unify downstream tasks with self-supervised learning techniques, which can pave the way for building the foundation model in the SAR target recognition field.
This generator aggregates the features extracted by the backbone and employs them as the condition to guide the point-to-point recovery from the noisy point cloud, thereby assisting the backbone in capturing both local and global geometric priors as well as the global point density distribution of the object.
Besides, we construct the OpenPCSeg codebase, which is the largest and most comprehensive outdoor LiDAR segmentation codebase.
Human-centric scene understanding is significant for real-world applications, but it is extremely challenging due to the existence of diverse human poses and actions, complex human-environment interactions, severe occlusions in crowds, etc.
They typically align visual features with semantic features obtained from word embedding by the supervision of seen classes' annotations.
Clothes-invariant feature extraction is critical to the clothes-changing person re-identification (CC-ReID).
We introduce the task of 3D visual grounding in large-scale dynamic scenes based on natural linguistic descriptions and online captured multi-modal visual data, including 2D images and 3D LiDAR point clouds.
We propose a simple yet effective label rectification strategy, which uses off-the-shelf panoptic segmentation labels to remove the traces of dynamic objects in completion labels, greatly improving the performance of deep models especially for those moving objects.
Ranked #1 on 3D Semantic Scene Completion on SemanticKITTI
We show that, for the first time, a range view method is able to surpass the point, voxel, and multi-view fusion counterparts in the competing LiDAR semantic and panoptic segmentation benchmarks, i. e., SemanticKITTI, nuScenes, and ScribbleKITTI.
Ranked #2 on 3D Semantic Segmentation on SemanticKITTI
Notably, LoGoNet ranks 1st on Waymo 3D object detection leaderboard and obtains 81. 02 mAPH (L2) detection performance.
For the first time, our pre-trained network achieves annotation-free 3D semantic segmentation with 20. 8% and 25. 08% mIoU on nuScenes and ScanNet, respectively.
To address these problems, we construct the homogeneous structure between the point cloud and images to avoid projective information loss by transforming the camera features into the LiDAR 3D space.
To further enhance the semantic consistency between the teacher and student model, we present a latent-direction-based distillation loss that preserves the semantic relations in latent space.
In recent years, vision-centric Bird's Eye View (BEV) perception has garnered significant interest from both industry and academia due to its inherent advantages, such as providing an intuitive representation of the world and being conducive to data fusion.
This article addresses the problem of distilling knowledge from a large teacher model to a slim student network for LiDAR semantic segmentation.
Ranked #5 on LIDAR Semantic Segmentation on nuScenes (val mIoU metric)
In addition, considering the property of sparse global distribution and density-varying local distribution of pedestrians, we further propose a novel method, Density-aware Hierarchical heatmap Aggregation (DHA), to enhance pedestrian perception in crowded scenes.
To further enhance the semantic consistency between the teacher and student model, we present another latent-direction-based distillation loss that preserves the semantic relations in latent space.
With the contribution of the CCD and CRP, our CRCKD algorithm can distill the relational knowledge more comprehensively.
Channel pruning is broadly recognized as an effective approach to obtain a small compact model through eliminating unimportant channels from a large cumbersome network.
We study the problem of distilling knowledge from a large deep teacher network to a much smaller student network for the task of road marking segmentation.
Ranked #1 on Semantic Segmentation on ApolloScape
Training deep models for lane detection is challenging due to the very subtle and sparse supervisory signals inherent in lane annotations.
Ranked #6 on Lane Detection on BDD100K val
Lane detection is an important yet challenging task in autonomous driving, which is affected by many factors, e. g., light conditions, occlusions caused by other vehicles, irrelevant markings on the road and the inherent long and thin property of lanes.
Ranked #19 on Lane Detection on TuSimple
In this paper, we considerably improve the accuracy and robustness of predictions through heterogeneous auxiliary networks feature mimicking, a new and effective training method that provides us with much richer contextual signals apart from steering direction.
Ranked #1 on Steering Control on BDD100K val
Recently, a state-of-the-art algorithm, called deep deterministic policy gradient (DDPG), has achieved good performance in many continuous control tasks in the MuJoCo simulator.