Finally, we verify the proposed framework on the public KITTI dataset with different 3D object detectors.
In this paper, we propose a simple but effective framework - MapFusion to integrate the map information into modern 3D object detector pipelines.
To get clear street-view and photo-realistic simulation in autonomous driving, we present an automatic video inpainting algorithm that can remove traffic agents from videos and synthesize missing regions with the guidance of depth/point cloud.
Ranked #1 on Image Inpainting on Apolloscape
Second, we propose a new framework for real-world DSR, which consists of four modules : 1) An iterative residual learning module with deep supervision to learn effective high-frequency components of depth maps in a coarse-to-fine manner; 2) A channel attention strategy to enhance channels with abundant high-frequency components; 3) A multi-stage fusion module to effectively re-exploit the results in the coarse-to-fine process; and 4) A depth refinement module to improve the depth map by TGV regularization and input loss.
To tackle this problem, we propose a simple but practical detection framework to jointly predict the 3D BBox and instance segmentation.
Ranked #8 on 3D Object Detection on KITTI Cars Hard
In this paper, we propose an end-to-end online 3D video object detector that operates on point cloud sequences.
Conventional absolute camera pose via a Perspective-n-Point (PnP) solver often assumes that the correspondences between 2D image pixels and 3D points are given.
Recovering the absolute metric scale from a monocular camera is a challenging but highly desirable problem for monocular camera-based systems.
Specifically, we first segment each car with a pre-trained Mask R-CNN, and then regress towards its 3D pose and shape based on a deformable 3D car model with or without using semantic keypoints.
Then, the image together with the retrieved shape model is fed into the proposed network to generate the fine-grained 3D point cloud.
In this paper, we provide a sensor fusion scheme integrating camera videos, consumer-grade motion sensors (GPS/IMU), and a 3D semantic map in order to achieve robust self-localization and semantic segmentation for autonomous driving.