In this work, we introduce a novel, end-to-end trainable CNN-based architecture to deliver high quality results for grasp detection suitable for a parallel-plate gripper, and semantic segmentation.
For this step, we propose a novel differentiable method for rendering the polygonal shapes of these proposals.
We explore how a general AI algorithm can be used for 3D scene understanding to reduce the need for training data.
We therefore show how we can calculate a normalization based on the expected 3D error, which we can then use to normalize the label jumps in the CRF.
In the fast developing countries it is hard to trace new buildings construction or old structures destruction and, as a result, to keep the up-to-date cadastre maps.
We propose a machine learning based approach for automatic regularization and polygonization of building segmentation masks.
In this paper we present a method for building boundary refinement and regularization in satellite images using a fully convolutional neural network trained with a combination of adversarial and regularized losses.
We propose three novel solvers for estimating the relative pose of a multi-camera system from affine correspondences (ACs).
It has been proposed by many researchers that combining deep neural networks with graphical models can create more efficient and better regularized composite models.
In order to deal with occlusions between components of the layout, which is a problem ignored by previous works, we introduce an analysis-by-synthesis method to iteratively refine the 3D layout estimate.
In this paper we present four cases of minimal solutions for two-view relative pose estimation by exploiting the affine transformation between feature points and we demonstrate efficient solvers for these cases.
Deep Neural Networks (DNNs) have the potential to improve the quality of image-based 3D reconstructions.
We propose a simple yet effective method to learn to segment new indoor scenes from video frames: State-of-the-art methods trained on one dataset, even as large as the SUNRGB-D dataset, can perform poorly when applied to images that are not part of the dataset, because of the dataset bias, a common phenomenon in computer vision.
We show that it is possible to learn semantic segmentation from very limited amounts of manual annotations, by enforcing geometric 3D constraints between multiple views.
We propose a method for urban 3D reconstruction, which incorporates semantic information and plane priors within the reconstruction process in order to generate visually appealing 3D models.
This paper addresses the highly challenging problem of automatically detecting man-made structures especially buildings in very high resolution (VHR) synthetic aperture radar (SAR) images.
Furthermore due to the design of the network, at test time only the 2D camera images are required for classification which enables the usage in portable computer vision systems.
While an increasing interest in deep models for single-image depth estimation methods can be observed, established schemes for their evaluation are still limited.
In the second step, we rank the resulting view clusters (i. e. key views with matching partners) according to their impact on the fulfillment of desired quality parameters such as completeness, ground resolution and accuracy.
In this article, we analyze the challenges of using deep learning for remote sensing data analysis, review the recent advances, and provide resources to make deep learning in remote sensing ridiculously simple to start with.
To minimize the number of cameras needed for surround perception, we utilize fisheye cameras.
In this paper we present a scalable approach for robustly computing a 3D surface mesh from multi-scale multi-view stereo point clouds that can handle extreme jumps of point density (in our experiments three orders of magnitude).
In this paper we present an autonomous system for acquiring close-range high-resolution images that maximize the quality of a later-on 3D reconstruction with respect to coverage, ground resolution and 3D uncertainty.
Learned confidence measures gain increasing importance for outlier removal and quality improvement in stereo vision.
In this paper, we present our minimal 4-point and linear 8-point algorithms to estimate the relative pose of a multi-camera system with known vertical directions, i. e. known absolute roll and pitch angles.