|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
This paper addresses the importance of full-image supervision for monocular depth estimation.
Autonomous systems possess the features of inferring their own ego-motion, autonomously understanding their surroundings, and planning trajectories.
Our proposed network architecture, the monster-net, is devised with a novel t-shaped adaptive kernel with globally and locally adaptive dilation, which can efficiently incorporate global camera shift into and handle local 3D geometries of the target image's pixels for the synthesis of naturally looking 3D panned views when a 2-D input image is given.
Instead of using semantic labels and proxy losses in a multi-task approach, we propose a new architecture leveraging fixed pretrained semantic segmentation networks to guide self-supervised representation learning via pixel-adaptive convolutions.
However, the prediction of saliency maps is itself vulnerable to the attacks, even though it is not the direct target of the attacks.
Monocular Depth Estimation is usually treated as a supervised and regression problem when it actually is very similar to semantic segmentation task since they both are fundamentally pixel-level classification tasks.
Unsupervised depth learning takes the appearance difference between a target view and a view synthesized from its adjacent frame as supervisory signal.
To address these challenges, we present ClearGrasp -- a deep learning approach for estimating accurate 3D geometry of transparent objects from a single RGB-D image for robotic manipulation.
Dense depth estimation from a single image is a key problem in computer vision, with exciting applications in a multitude of robotic tasks.