|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
In this paper, we propose a novel Edge-Guided post-processing to reduce the occlusion fading issue for self-supervised monocular depth estimation.
Per-pixel ground-truth depth data is challenging to acquire at scale.
In this paper, we first analyse the data distributions and interaction of foreground and background, then propose the foreground-background separated monocular depth estimation (ForeSeE) method, to estimate the foreground depth and background depth using separate optimization objectives and depth decoders.
To the best of our knowledge, this is the first work to show that deep networks trained using unlabelled monocular videos can predict globally scale-consistent camera trajectories over a long video sequence.
#13 best model for Monocular Depth Estimation on KITTI Eigen split (using extra training data)
By viewing the indices as a function of the feature map, we introduce the concept of "learning to index", and present a novel index-guided encoder-decoder framework where indices are self-learned adaptively from data and are used to guide the downsampling and upsampling stages, without extra training supervision.
Monocular depth prediction plays a crucial role in understanding 3D scene geometry.
#2 best model for Monocular Depth Estimation on NYU-Depth V2
Estimating accurate depth from a single image is challenging, because it is an ill-posed problem as infinitely many 3D scenes can be projected to the same 2D scene.
Using our predicted error-map, we demonstrate that by up-filling a LiDAR point cloud from 18, 000 points to 285, 000 points, versus 300, 000 points for full depth, we can reduce the RMSE error from 1004 to 399.
We propose a Residual Pyramid Decoder (RPD) which expresses global scene structure in upper levels to represent layouts, and local structure in lower levels to present shape details.
#7 best model for Monocular Depth Estimation on NYU-Depth V2
In particular, we propose a robust training objective that is invariant to changes in depth range and scale, advocate the use of principled multi-objective learning to combine data from different sources, and highlight the importance of pretraining encoders on auxiliary tasks.