The Transformer architecture has achieved remarkable success in a number of domains including natural language processing and computer vision.
Self-supervised depth learning from monocular images normally relies on the 2D pixel-wise photometric relation between temporally adjacent image frames.
In addition, we also propose a deep conditional entropy model to estimate the probability distribution of the transformed coefficients, by incorporating temporal context from consecutive point clouds and the motion estimation/compensation modules.
In this work, we propose an almost-universal sampler, in our quest for a sampler that can learn to preserve the most useful points for a particular task, yet be inexpensive to adapt to different tasks, models, or datasets.
Scene flow is a powerful tool for capturing the motion field of 3D point clouds.
To reduce the memory and computational cost, existing point-based pipelines usually adopt task-agnostic random sampling or farthest point sampling to progressively downsample input point clouds, despite the fact that not all points are equally important to the task of object detection.
Specifically, we introduce a synthetic aerial photogrammetry point clouds generation pipeline that takes full advantage of open geospatial data sources and off-the-shelf commercial packages.
Each point in the dataset has been labelled with fine-grained semantic annotations, resulting in a dataset that is three times the size of the previous existing largest photogrammetric point cloud dataset.
In this paper, we introduce a neural architecture, termed Box2Seg, to learn point-level semantics of 3D point clouds with bounding box-level supervision.
Satellite video cameras can provide continuous observation for a large-scale area, which is important for many remote sensing applications.
We study the problem of efficient semantic segmentation of large-scale 3D point clouds.
Labelling point clouds fully is highly time-consuming and costly.
Extracting robust and general 3D local features is key to downstream tasks such as point cloud registration and reconstruction.
An essential prerequisite for unleashing the potential of supervised deep learning algorithms in the area of 3D scene understanding is the availability of large-scale and richly annotated datasets.
To have a better understanding and usage of Convolution Neural Networks (CNNs), the visualization and interpretation of CNNs has attracted increasing attention in recent years.
To stimulate future research, this paper presents a comprehensive review of recent progress in deep learning methods for point clouds.
We study the problem of efficient semantic segmentation for large-scale 3D point clouds.
Ranked #4 on Semantic Segmentation on Semantic3D
The framework directly regresses 3D bounding boxes for all instances in a point cloud, while simultaneously predicting a point-level mask for each instance.
Ranked #11 on 3D Instance Segmentation on S3DIS (mPrec metric)