We present a novel differentiable rendering framework for joint geometry, material, and lighting estimation from multi-view images.
Generating robust and reliable correspondences across images is a fundamental task for a diversity of applications.
The first one is the Hessian regularization that smoothly diffuses the signed distance values to the entire distance field given noisy and incomplete input.
We present a differentiable rendering framework for material and lighting estimation from multi-view images and a reconstructed geometry.
In this work, we introduce a novel neural surface reconstruction framework that leverages the knowledge of stereo matching and feature consistency to optimize the implicit surface representation.
2) Seeded Graph Neural Network, which utilizes seed matches to pass messages within/across images and predicts assignment costs.
Finally, a matchability-aware disparity refinement is introduced to improve the depth inference in weakly matchable regions.
Ranked #1 on Stereo Disparity Estimation on KITTI 2015
In this paper, we introduce a novel network, called discriminative feature network (DFNet), to address the unsupervised video object segmentation task.
Ranked #1 on Video Object Segmentation on FBMS
In this work, we propose a stochastic bundle adjustment algorithm which seeks to decompose the RCS approximately inside the LM iterations to improve the efficiency and scalability.
Recent learning-based approaches, in which models are trained by single-view images have shown promising results for monocular 3D face reconstruction, but they suffer from the ill-posed face pose and depth ambiguity issue.
Ranked #6 on 3D Face Reconstruction on REALY (side-view)
Temporal camera relocalization estimates the pose with respect to each video frame in sequence, as opposed to one-shot relocalization which focuses on a still image.
This work focuses on mitigating two limitations in the joint learning of local feature detectors and descriptors.
In this paper, we leverage a 3D fully convolutional network for 3D point clouds, and propose a novel and practical learning mechanism that densely predicts both a detection score and a description feature for each 3D point.
Ranked #2 on Point Cloud Registration on KITTI
Compared with other computer vision tasks, it is rather difficult to collect a large-scale MVS dataset as it requires expensive active scanners and labor-intensive process to obtain ground truth 3D structures.
The self-supervised learning of depth and pose from monocular sequences provides an attractive solution by using the photometric consistency of nearby frames as it depends much less on the ground-truth data.
First, to capture the local context of sparse correspondences, the network clusters unordered input correspondences by learning a soft assignment matrix.
On the other hand, it learns more efficiently with the more efficient gradient backpropagation.
Ranked #63 on Semantic Segmentation on NYU Depth v2
Most existing studies on learning local features focus on the patch-based descriptions of individual keypoints, whereas neglecting the spatial relations established from their keypoint locations.
However, one major limitation of current learned MVS approaches is the scalability: the memory-consuming cost volume regularization makes the learned MVS hard to be applied to high-resolution scenes.
Accurate relative pose is one of the key components in visual odometry (VO) and simultaneous localization and mapping (SLAM).
Convolutional Neural Networks (CNNs) have achieved superior performance on object image retrieval, while Bag-of-Words (BoW) models with handcrafted local features still dominate the retrieval of overlapping images in 3D reconstruction.
Learned local descriptors based on Convolutional Neural Networks (CNNs) have achieved significant improvements on patch-based benchmarks, whereas not having demonstrated strong generalization ability on recent benchmarks of image-based 3D reconstruction.
Critical to the registration of point clouds is the establishment of a set of accurate correspondences between points in 3D space.
We present an end-to-end deep learning architecture for depth map inference from multi-view images.
Ranked #14 on Point Clouds on Tanks and Temples (Mean F1 (Intermediate) metric)
In this paper, we propose a distributed approach to coping with this global bundle adjustment for very large scale Structure-from-Motion computation.
In this paper, we propose a scale-invariant image matching approach to tackling the very large scale variation of views.
In this paper, we tackle the accurate and consistent Structure from Motion (SfM) problem, in particular camera registration, far exceeding the memory of a single computer in parallel.
While extracting derivative colors from achromatic regions to approximate the illuminant color well is basically straightforward, the success of our extraction in highlight regions is attributed to the different rates of variation of the diffuse and specular magnitudes in the dichromatic reflection model.
Line segment is a prominent feature in artificial environments and it can supply sufficient geometrical and structural information of scenes, which not only helps guild to a correct warp in low-texture condition, but also prevents the undesired distortion induced by warping.
In this paper, we propose a structural segmentation algorithm to partition multi-view stereo reconstructed surfaces of large-scale urban environments into structural segments.
To solve this problem, we propose a joint optimization in a hierarchical framework to obtain the final surface segments and corresponding optimal camera clusters.
As an extension of SIFT, our method seeks to add prior to solve the ill-posed affine parameter estimation problem and normalizes them directly, and is applicable to objects with regular structures.
To this end, we propose a segment-based approach to readjust the camera poses locally and improve the reconstruction for fine geometry details.