Processing large indoor scenes is a challenging task, as scan registration and camera trajectory estimation methods accumulate errors across time.
On the contrary, we propose GP2, General-Purpose and Geometry-Preserving training scheme, and show that conventional SVDE models can learn correct shifts themselves without any post-processing, benefiting from using stereo data even in the geometry-preserving setting.
Existing 3D object detection methods make prior assumptions on the geometry of objects, and we argue that it limits their generalization ability.
Ranked #1 on 3D Object Detection on S3DIS
To address this problem, we propose ImVoxelNet, a novel fully convolutional method of 3D object detection based on monocular or multi-view RGB images.
Ranked #1 on Monocular 3D Object Detection on SUN RGB-D
Our work shows that a model trained on this data along with conventional datasets can gain accuracy while predicting correct scene geometry.
We train visual odometry model on synthetic data and do not use ground truth poses hence this model can be considered unsupervised.
We find that while in many cases the accuracy of SLAM is very good, the robustness is still an issue.
We present a novel dataset for training and benchmarking semantic SLAM methods.
Optical Flow (OF) and depth are commonly used for visual odometry since they provide sufficient information about camera ego-motion in a rigid scene.