We present a novel differentiable rendering framework for joint geometry, material, and lighting estimation from multi-view images.
Generating robust and reliable correspondences across images is a fundamental task for a diversity of applications.
The first one is the Hessian regularization that smoothly diffuses the signed distance values to the entire distance field given noisy and incomplete input.
We present a differentiable rendering framework for material and lighting estimation from multi-view images and a reconstructed geometry.
As such, the adverse influence of occluded pixels is suppressed in the cost fusion.
Ranked #1 on Point Clouds on DTU
Finally, a matchability-aware disparity refinement is introduced to improve the depth inference in weakly matchable regions.
Ranked #1 on Stereo Disparity Estimation on KITTI 2015
In this paper, we introduce a novel network, called discriminative feature network (DFNet), to address the unsupervised video object segmentation task.
Ranked #1 on Video Object Segmentation on FBMS
In this work, we propose a stochastic bundle adjustment algorithm which seeks to decompose the RCS approximately inside the LM iterations to improve the efficiency and scalability.
Recent learning-based approaches, in which models are trained by single-view images have shown promising results for monocular 3D face reconstruction, but they suffer from the ill-posed face pose and depth ambiguity issue.
Ranked #6 on 3D Face Reconstruction on REALY (side-view)
In this paper, we present a joint multi-task learning framework for semantic segmentation and boundary detection.
Temporal camera relocalization estimates the pose with respect to each video frame in sequence, as opposed to one-shot relocalization which focuses on a still image.
This work focuses on mitigating two limitations in the joint learning of local feature detectors and descriptors.
Compared with other computer vision tasks, it is rather difficult to collect a large-scale MVS dataset as it requires expensive active scanners and labor-intensive process to obtain ground truth 3D structures.
The self-supervised learning of depth and pose from monocular sequences provides an attractive solution by using the photometric consistency of nearby frames as it depends much less on the ground-truth data.
On the other hand, it learns more efficiently with the more efficient gradient backpropagation.
Ranked #63 on Semantic Segmentation on NYU Depth v2
Most existing studies on learning local features focus on the patch-based descriptions of individual keypoints, whereas neglecting the spatial relations established from their keypoint locations.
However, one major limitation of current learned MVS approaches is the scalability: the memory-consuming cost volume regularization makes the learned MVS hard to be applied to high-resolution scenes.
Accurate relative pose is one of the key components in visual odometry (VO) and simultaneous localization and mapping (SLAM).
Convolutional Neural Networks (CNNs) have achieved superior performance on object image retrieval, while Bag-of-Words (BoW) models with handcrafted local features still dominate the retrieval of overlapping images in 3D reconstruction.
Learned local descriptors based on Convolutional Neural Networks (CNNs) have achieved significant improvements on patch-based benchmarks, whereas not having demonstrated strong generalization ability on recent benchmarks of image-based 3D reconstruction.
Critical to the registration of point clouds is the establishment of a set of accurate correspondences between points in 3D space.
We present an end-to-end deep learning architecture for depth map inference from multi-view images.
Ranked #14 on Point Clouds on Tanks and Temples (Mean F1 (Intermediate) metric)
In this paper, we propose a distributed approach to coping with this global bundle adjustment for very large scale Structure-from-Motion computation.
In this paper, we propose a scale-invariant image matching approach to tackling the very large scale variation of views.
In this paper, we tackle the accurate and consistent Structure from Motion (SfM) problem, in particular camera registration, far exceeding the memory of a single computer in parallel.
To solve this problem, we propose a joint optimization in a hierarchical framework to obtain the final surface segments and corresponding optimal camera clusters.
In this paper, we propose a structural segmentation algorithm to partition multi-view stereo reconstructed surfaces of large-scale urban environments into structural segments.
To this end, we propose a segment-based approach to readjust the camera poses locally and improve the reconstruction for fine geometry details.